Policy Enforcement for Terraform & OpenTofu with OPA and Scalr

Policy enforcement for Terraform & OpenTofu: integrate OPA with Scalr to build guardrails, automate compliance checks, and secure every run.

Review for accuracy by Ryan Fee on June 6, 2025.

This summary outlines key findings on implementing policy enforcement for Infrastructure as Code (IaC) managed by Terraform and OpenTofu, focusing on Open Policy Agent (OPA) and Scalr.

The Need for IaC Governance

Managing infrastructure with Terraform or OpenTofu provides speed and consistency but also introduces risks without robust governance. These risks include security vulnerabilities, compliance breaches, cost overruns, and inconsistent configurations. Policy as Code (PaC) is the practice of defining and managing policies using code to mitigate these risks.

Open Policy Agent (OPA) and Rego

OPA is an open-source, general-purpose policy engine that allows you to decouple policy decision-making from your application logic. It enables you to define and enforce policies consistently across various systems and services, such as Terraform and OpenTofu. Rego is the high-level declarative query language used by OPA to write these policies, allowing you to express complex rules over hierarchical data structures like JSON.

  • OPA: An open-source, general-purpose policy engine used to enforce policies across various systems, including Terraform/OpenTofu. It evaluates policies against JSON input, typically the Terraform plan.
  • Rego: A high-level declarative language used to write OPA policies. Policies define conditions that must hold true. Key features include its JSON-native design, rule-based structure, and modularity.
  • Evaluation Process: OPA evaluates Rego policies against a JSON representation of the Terraform plan (terraform show -json tfplan.json). The plan's resource_changes array is crucial for inspecting proposed infrastructure modifications.
  • Tooling:
    • OPA CLI: For local policy evaluation, testing, and interactive development (opa eval, opa test, opa run).
    • TACOs: Products, such as Scalr, built for Terraform and/or OpenTofu that have native integrations with OPA.

Scalr and OPA

Scalr is a SaaS platform providing a remote state and operations backend for Terraform/OpenTofu, offering advanced governance features. The features that help with policy are:

  • Hierarchical Structure: Organizes resources and policies via Account, Environment, and Workspace scopes, allowing for policy inheritance and centralized governance with decentralized operations.
  • Native OPA Integration:
    • Policies are managed via GitOps (stored in VCS, linked to Scalr).
    • Pre-plan Checks: Evaluate run context (initiator, VCS details) before plan generation. Cost-effective for early contextual validation.
    • Post-plan Checks: Evaluate the full Terraform plan JSON after plan generation for detailed resource validation.
    • Enforcement Levels: Policies can be set to Hard Mandatory (blocks run), Soft Mandatory (requires approval for post-plan violations), or Advisory (logs warning).
  • Policy Management: OPA policy groups are defined (often at Account scope using the Scalr Terraform provider) and linked to environments, with workspaces inheriting them.

Example Policies

The following are some example policies to help you get started with OPA. A full library can be found here.

Cost Policy

Infracost can be used as part of the run to estimate the cost based on the run. Scalr will inject the cost into the tfrun data to the be evaluated by OPA:

# Simple check that cost estimate is above threshold.

package terraform

import input.tfrun as tfrun


deny[reason] {
    cost = tfrun.cost_estimate.proposed_monthly_cost
    cost > 5
    reason := sprintf("Plan is too expensive: $%.2f, while up to $5 is allowed", [cost])
}

Prevent a Destructive Run

If you are using the Scalr provider to manage Scalr workspaces, you may want to put checks in to make sure that users are not destroying workspaces that have active state:

# Enforces that workspaces are tagged with the names of the providers.

package terraform

import input.tfplan as tfplan


deny["Can not destroy workspace with active state"] {
    resource := tfplan.resource_changes[_]
    "delete" == resource.change.actions[count(resource.change.actions) - 1]
    "scalr_workspace" == resource.type

    resource.change.before.has_resources
}

Prevent Auto-Apply

Want to prevent auto-apply runs from happening in an environment such as prod? The following policy will check to see if the workspace setting has auto-apply enabled:

#Allprod  runs must be approved

package terraform

import input.tfrun as tfrun

#Deny if auto-apply is enabled

deny["auto-apply is not allowed"] {
  tfrun.workspace.auto_apply == true
}

Limit Module Source

Want to enforce where modules are being pulled from for specific resources? The following module will evaluate the module source and prevent resources from being pulled from sources they shouldn't be:

# Enforce that specificied resource types are only created by specific modules and not in the root module.

package terraform

import input.tfplan as tfplan


# Map of resource types which must be created only using module
# with corresponding module source
resource_modules = {
    "aws_db_instance": "terraform-aws-modules/rds/aws"
}

array_contains(arr, elem) {
  arr[_] = elem
}

deny[reason] {
    resource := tfplan.resource_changes[_]
    action := resource.change.actions[count(resource.change.actions) - 1]
    array_contains(["create", "update"], action)
    module_source = resource_modules[resource.type]
    not resource.module_address
    reason := sprintf(
        "%s cannot be created directly. Module '%s' must be used instead",
        [resource.address, module_source]
    )
}

deny[reason] {
    resource := tfplan.resource_changes[_]
    action := resource.change.actions[count(resource.change.actions) - 1]
    array_contains(["create", "update"], action)
    module_source = resource_modules[resource.type]
    parts = split(resource.module_address, ".")
    module_name := parts[1]
    actual_source := tfplan.configuration.root_module.module_calls[module_name].source
    not actual_source == module_source
    reason := sprintf(
        "%s must be created with '%s' module, but '%s' is used",
        [resource.address, module_source, actual_source]
    )
}

OPA Coding Best Practices

1. Structure Your Policies Logically:

  • Packages: Organize your policies into packages that reflect their purpose or the resources they govern (e.g., package terraform.aws.s3, package terraform.azure.networking). This improves readability and maintainability.
  • Modularity: Break down complex policies into smaller, reusable rules and functions. This makes policies easier to understand, test, and modify. For example, create helper functions to check for common conditions like the presence of specific tags or adherence to naming conventions. Scalr has the concept of functions, allowing you to reuse code.
  • Clear Naming: Use descriptive names for your packages, rules, and variables to clearly indicate their intent.

2. Write Clear and Concise Rego Code:

  • Deny by Default, Allow Explicitly: It's often safer to establish a default "deny" stance and then write specific rules that define "allow" conditions, or more commonly, define "deny" rules that generate violation messages when specific conditions are met. If no deny rules are triggered, the configuration is considered compliant.
  • Focus on the input Document: Understand the structure of the Terraform plan JSON (often accessed via input.tfplan depending on the integration tool). This is the data your policies will evaluate. Tools like terraform show -json tfplan.binary > tfplan.json can help you inspect this structure.
  • Use Comprehensions and Iteration Wisely: Rego's array and object comprehensions are powerful for iterating over resources and their attributes (e.g., all_instances := [res | res := input.plan.resource_changes[_]; res.type == "aws_instance"]).
  • Explicit Imports: When using helper functions or data from other files or the input document, use explicit import statements (e.g., import input.plan as tfplan).
  • Effective Use of sprintf for Messages: Create clear, informative violation messages using sprintf. Include details like the resource address and the specific reason for the violation (e.g., msg := sprintf("%v: S3 bucket is missing encryption configuration", [resource.address])).

3. Target Specific Resource Changes:

  • Resource Type and Actions: Filter policies to apply only to relevant resource types (resource.type == "aws_s3_bucket") and actions (resource.change.actions[_] == "create"). This prevents unintended consequences and improves performance.
  • Accessing Resource Attributes: Navigate the resource_changes array in the Terraform plan JSON to access the attributes of resources being created, updated, or deleted (e.g., resource.change.after.tags for tags on a new or updated resource).

4. Test Your Policies Thoroughly:

  • Unit Tests: Write unit tests for your Rego policies using opa test. Create mock input data that covers various scenarios, including compliant and non-compliant configurations. This is crucial for verifying policy logic.
  • Local Validation: Use tools like conftest or opa eval to test your policies against actual Terraform plan JSON output locally before integrating them into CI/CD pipelines. This provides a faster feedback loop.

5. Manage and Version Your Policies:

  • Version Control: Store your Rego policies in a version control system (like Git) alongside your Terraform code or in a dedicated policy repository.
  • Policy Sets: Group related policies into sets for easier management and application to different environments or projects.
  • Documentation: Document your policies, explaining their purpose, the rationale behind them, and how to remediate violations.

6. Incremental Rollout and Audit Mode:

  • Start with Audit/Warn Mode: When introducing new or complex policies, initially configure them to "warn" or "audit" rather than "deny" or "enforce." This allows you to identify potential issues and refine policies without blocking deployments.
  • Iterate and Refine: Continuously review and refine your policies based on feedback, evolving requirements, and new security best practices.

Even better, than the above, use a tool like Scalr that provides an impact analysis before merging policies into main. The impact analysis will execute the new OPA code against the last run in a Terraform workspace and tell the OPA admins what to expect IF the code is merged.

Reporting

Using a tool like Scalr to see all your OPA policy results in one spot really helps manage Infrastructure as Code. You get a clear view of how compliant everything is across all your workspaces, without digging into each one separately. This saves a lot of time and makes it easier to spot common problems or risks quickly.

When it's time for audits or compliance reports, having all this information together makes the process much simpler. It also helps ensure your policies are actually being checked the same way everywhere, which is key as your infrastructure grows and you need strong, consistent governance.

Benefits and Approach

When Open Policy Agent is integrated with Scalr, it empowers organizations to enforce fine-grained, custom policies on their Infrastructure as Code before deployment, significantly enhancing governance and compliance. Scalr utilizes OPA to check Terraform plans against these predefined Rego policies, catching potential security vulnerabilities, misconfigurations, or non-compliant resource definitions early in the development lifecycle. This combination allows for automated, consistent policy enforcement within Scalr's collaborative IaC workflows, ensuring infrastructure adheres to organizational standards without slowing down development.

Future Directions

The field is evolving towards more external data integration in OPA, advanced policy layering, potential mutation policies, AI/ML in policy suggestion/detection, and standardization of policy libraries for common compliance benchmarks.