Managing Terraform Drift
Detect, triage and resolve Terraform drift with Scalr’s policy-driven workflows, automated scans and guardrails—keep every stack compliant.
Infrastructure as Code (IaC) has revolutionized how we build and manage IT environments. Tools like Terraform, and its open-source counterpart OpenTofu, empower teams to define infrastructure declaratively, bringing predictability and automation. However, even in the most well-managed IaC setups, a common challenge persists: infrastructure drift.
Drift occurs when the actual state of your deployed infrastructure diverges from the intended state defined in your code. This silent deviation can creep in through manual changes, overlapping automation, or even dynamic cloud provider actions. Left unchecked, drift can lead to security vulnerabilities, compliance breaches, unexpected costs, and operational instability.
This post explores how to detect and manage infrastructure drift, looking at native IaC capabilities, the broader tool ecosystem, and how platforms like Scalr offer a robust approach to maintaining configuration integrity.
The First Line of Defense: Native plan
Operations
Both Terraform and OpenTofu provide a fundamental mechanism for spotting discrepancies: the plan
command.
# For Terraform
terraform plan
# For OpenTofu
tofu plan
When you run terraform plan
or tofu plan
, the IaC tool reads your configuration, checks the current state file, and queries your cloud provider for the actual state of managed resources. If the plan output shows proposed changes (creations, updates, or destructions) when you haven't intentionally modified your code, that's a clear sign of drift.
Strengths:
- Built-in: No extra tools needed for basic checks.
- Authoritative: Directly compares your code's intent with the (refreshed) state.
Limitations:
- Managed Resources Only: Won't detect resources created manually outside of your IaC configuration.
- Verbose Output: Can be hard to sift through in large environments.
- Scalability: Running plans constantly across many workspaces can be cumbersome.
- No Automatic Remediation: Identifies drift but doesn't fix it.
OpenTofu, in its commitment to safety, also notably deprecates the standalone tofu refresh
command, recommending tofu apply -refresh-only
to allow users to review state changes before they are committed. This highlights a community focus on safer operational patterns.
Why Native Isn't Always Enough
While plan
is essential, complex environments demand more. Continuous, automated detection, clear reporting across multiple projects, and streamlined remediation are crucial for effective drift management at scale. This is where specialized tools and platforms come into play.
Scalr: Controlled Drift Management for Terraform and OpenTofu
Scalr is an IaC management platform that integrates comprehensive drift detection natively, supporting both Terraform and OpenTofu environments. Its philosophy centers on providing robust detection while ensuring users retain full control over remediation actions.
Key Aspects of Scalr's Approach:
- Automated & Scheduled Detection: Scalr allows you to configure drift detection to run automatically at set intervals (e.g., daily, weekly) across all workspaces within an environment. This ensures consistent monitoring without manual intervention.
- Flexible Detection Sources: Scalr doesn't just compare your Git-committed code to the live state. It can also detect drift by comparing against the "last known applied state." This is valuable for catching discrepancies that might occur if a deployment was successful but not immediately committed, or if you need to verify against the last operational state.
- Clear Reporting and Insights:
- A dedicated "Drift Detection" tab in the UI lists all detected discrepancies.
- Customizable drift dashboards provide an organizational overview of drift status.
- Notifications (e.g., via Slack, with MS Teams planned) alert relevant teams promptly.
- User-Controlled Remediation: This is a cornerstone of Scalr's design. When drift is detected, Scalr doesn't automatically "fix" it. Instead, it presents clear, deliberate options:
- Ignore: Acknowledge the drift if it's intentional or will be handled externally.
- Sync State (Refresh-Only Run): Update Scalr's state file to match the actual (drifted) infrastructure. This is akin to a
terraform refresh
ortofu apply -refresh-only
. - Revert Infrastructure (Plan & Apply Run): If the drift is undesired, Scalr generates a plan to revert to the previously defined state and applies it upon your approval.
This user-centric approach ensures that no changes are made to your infrastructure without explicit review and consent, aligning with best practices for operational safety and change management.
Conceptual Interaction with Scalr's Remediation:
Imagine Scalr's UI or CLI presenting the following after detecting drift:
Scalr Drift Detection Report:
Workspace: 'production-vpc'
Drift Detected:
~ aws_security_group.web_sg:
ingress:
- (known) cidr_blocks: ["10.0.0.0/16"]
+ (drifted) cidr_blocks: ["0.0.0.0/0"] # Unintended change
Available Actions:
1. Ignore Drift: [Acknowledge and do nothing in Scalr]
2. Sync State: [Run 'tofu apply -refresh-only' to update state file with 0.0.0.0/0]
3. Revert Infrastructure: [Plan and apply changes to revert to 10.0.0.0/16]
Choose action (1-3):
This is a conceptual representation. Actual Scalr interaction occurs through its web UI.
- Explicit OpenTofu Support: As a founding member of the OpenTofu initiative, Scalr provides robust, first-class support, ensuring a seamless experience for OpenTofu users.
The Wider World of Drift Detection
Beyond Scalr, the ecosystem offers various tools:
- Other IaC Management Platforms:
- Terramate: Focuses on DRY configurations and CI/CD orchestration, offering automated reconciliation options.
- env0: Provides AI-powered drift cause analysis and flexible remediation policies.
- Spacelift: Also offers optional automated remediation for detected drift. These platforms often provide more aggressive automation in remediation, which suits some organizational needs but differs from Scalr's control-focused approach.
- Standalone & Open-Source Tools:
- Driftctl (by Snyk, now in maintenance mode as OSS): Historically strong at detecting unmanaged resources (those created outside IaC). Its technology is integrated into Snyk IaC.
- Driftive: A newer open-source tool with explicit support for Terraform, OpenTofu, and Terragrunt. It focuses on detection and notification (e.g., via Slack or GitHub Issues).
Example: Configuring Driftive
Driftive uses a driftive.yml
file to define how it scans projects. Here's a conceptual snippet:
# driftive.yml (Conceptual Example)
projects:
include:
- "**/terraform"
- "**/tofu_projects"
exclude:
- "**/modules"
rules:
- name: "Terraform Projects"
match: "**/terraform/**/*.tf"
executable: "terraform" # or "tofu"
- name: "OpenTofu Specific"
match: "**/tofu_projects/**/*.tf"
executable: "tofu"
notifications:
slack:
webhook_url: "https://hooks.slack.com/services/YOUR/SLACK/WEBHOOK"
github:
enabled: true
repository_owner: "your-org"
repository_name: "infra-drift-issues"
summary_issue: true
This allows for configurable, self-hosted drift detection, primarily focused on alerting rather than guided remediation within a platform.
At a Glance: Comparing Approaches
Feature | Scalr | Terramate (Cloud) | env0 | Driftive (OSS) |
---|---|---|---|---|
Primary Focus | IaC Platform, Controlled Drift Mgmt | IaC Orchestration, Automated Reconcile | IaC Platform, AI-Powered Analysis | CLI Detection & Notification |
OpenTofu Support | Yes (Explicit, Founding Member) | Yes (Explicit) | Yes (Explicit, Founding Member) | Yes (Explicit) |
Unmanaged Resource Detect | Not primary focus (targets managed) | Plan-based (limited) | Plan-based (managed context focus) | Plan-based (limited) |
Remediation | User-Controlled (Ignore, Sync, Revert) | Automated Option (Reconcile) | Flexible (Auto-Policies, Code Sync, etc.) | Manual (via Notifications) |
Reporting | UI, Dashboards, Slack, (Teams planned) | Terramate Cloud UI, Slack | UI, Notifications, AI Insights | Slack, GitHub Issues |
Ease of Use | Managed Platform | Platform + CI Config | Managed Platform | CLI + YAML Config |
Typical User | Orgs wanting control, OpenTofu support | Orgs wanting CI-driven auto-remedy | Orgs wanting deep cause analysis, AI | Teams needing OSS, notification-first |
Choosing Your Drift Management Strategy
The right tool depends on your organization's needs:
- For control and safety: If your organization prioritizes deliberate, reviewed changes and requires strong OpenTofu support within a managed platform, Scalr's user-controlled remediation and comprehensive detection offer a compelling balance.
- For high automation: If you're comfortable with fully automated drift reconciliation integrated into CI/CD, platforms like Terramate or Spacelift might be suitable.
- For deep analysis: If understanding the "why" behind drift is critical, env0's AI-powered insights are a strong differentiator.
- For lightweight, OSS notification: If you need a self-hosted, notification-focused tool with good OpenTofu/Terragrunt support, Driftive is a solid choice.
Remember, tools are only part of the solution. Effective drift management also requires strong GitOps practices, policy-as-code, regular audits, and a team culture that prioritizes IaC principles.
Conclusion: Maintaining Infrastructure Integrity
Infrastructure drift is an ongoing challenge in dynamic cloud environments. While native IaC commands provide a starting point, robust and scalable drift management often requires dedicated tooling.
Scalr offers a thoughtful approach, combining automated detection with a user-controlled remediation framework. This ensures that organizations can maintain visibility into their infrastructure's state and make informed, deliberate decisions when drift occurs, especially for those embracing OpenTofu. By choosing the right tools and practices, you can significantly reduce the risks associated with drift and maintain a secure, compliant, and reliable infrastructure.