A Practical Guide to Terraform Operations with Atlantis:

Discover how Atlantis automates PR-based Terraform workflows—secure plans, enforce policy, and scale infrastructure changes for faster, safer delivery.

1. Introduction: Beyond Basic Plan and Apply

Terraform Atlantis has become a cornerstone for many teams adopting GitOps for infrastructure as code. By integrating terraform plan and terraform apply directly into pull request (PR) workflows, it enhances collaboration and provides a clear audit trail. However, the lifecycle of infrastructure management often demands more than these fundamental operations.

This post explores advanced use cases for Atlantis, exploring how to manage resource imports, perform state manipulations, and customize workflows for other Terraform CLI commands. While Atlantis provides the building blocks for these operations, organizations scaling their IaC practices may find that ensuring consistent governance, security, and operational efficiency across numerous teams and projects can introduce complexity. This is where understanding the limits and potential extensions or alternatives becomes crucial.

2. Importing Existing Infrastructure: atlantis import and Declarative Alternatives

Bringing existing, manually-created infrastructure under Terraform management is a common requirement. Atlantis offers mechanisms to integrate this into your GitOps flow.

2.1. Using the atlantis import Command

Atlantis allows users to trigger terraform import via a PR comment:

atlantis import [options] ADDRESS ID -- [terraform import flags]

For example: atlantis import -d prod/networking aws_vpc.main_vpc vpc-12345678

To enable and control this, you'll configure import_requirements in your Atlantis server-side configuration (repos.yaml or server-side-repo-config) or per-repository atlantis.yaml:

# repos.yaml example
repos:
  - id: /.*/ # Applies to all repositories
    import_requirements: [approved, mergeable] # Requires PR approval and mergeability

These requirements (e.g., approved, mergeable) ensure imports receive similar scrutiny to apply operations.

2.2. Security Considerations for atlantis import

Enabling atlantis import requires careful security considerations:

  • Strict import_requirements: Essential first line of defense.
  • Webhook Security: Secure webhooks with secrets and HTTPS.
  • Atlantis Server Authentication: Protect the Atlantis UI/API.
  • Secure Terraform Provider Credentials: Use instance roles or secrets management; avoid hardcoding.
  • RBAC: Leverage VCS features like CODEOWNERS and restrict comment permissions.
  • Malicious Code Risk: Remember that an import often leads to a plan. Malicious HCL in a PR could still pose a risk during the subsequent plan phase.
  • State File Security: Ensure your Terraform state backend is secure and encrypted.
  1. Define Resource in HCL: Write the Terraform configuration for the resource to be imported.
  2. Create Pull Request: Submit the HCL changes.
  3. Review HCL: Scrutinize the configuration.
  4. Comment atlantis import: Once approved, an authorized user issues the command.
  5. Run atlantis plan Post-Import: Crucial step. This validates the import and shows discrepancies between your HCL and the actual resource state.
  6. Adjust HCL: Modify the HCL based on the plan output until it accurately reflects the imported resource.
  7. Final Approval and Apply: Once the plan is clean, apply the changes.

2.4. The Rise of HCL import Blocks (Terraform 1.5+)

Terraform v1.5.0 introduced config-driven imports using import blocks directly in HCL, offering a more declarative, GitOps-native approach:

resource "aws_instance" "example" {
  # Configuration for the instance...
}

import {
  to = aws_instance.example
  id = "i-0ecd5e8ed288048d9" // The existing instance ID
}

With this method, the import operation becomes part of the standard atlantis plan and atlantis apply cycle. Atlantis handles this naturally.

atlantis import Command vs. HCL import Block:

Feature

atlantis import Command

HCL import Block (Terraform 1.5+)

Invocation

Imperative (PR comment)

Declarative (in .tf code)

Workflow

Separate import, then plan, then apply

Integrated into standard plan & apply

Auditability

PR comment trail; HCL for resource config.

Intent fully captured in Git history (HCL).

Primary Use Cases

Ad-hoc imports, pre-1.5 Terraform.

Preferred for GitOps, ongoing management.

While HCL import blocks are generally preferred for their declarative nature, the atlantis import command remains useful for older Terraform versions or specific ad-hoc scenarios. Managing these workflows consistently, especially ensuring that the post-import plan and HCL adjustments are diligently performed, can become challenging at scale. Platforms offering more structured workflows or policy enforcement around resource onboarding, like Scalr, can provide additional guardrails and visibility here.

3. Navigating Terraform State Manipulation: Risks and Best Practices

Direct state manipulation (terraform state mv, terraform state rm) is sometimes necessary but carries significant risks.

3.1. Moving Resources: Prefer HCL moved Blocks

Atlantis doesn't offer a dedicated atlantis state mv command. While custom workflows could theoretically execute terraform state mv, this is complex and risky.

The highly recommended approach (Terraform 1.1+) is using declarative HCL moved blocks:

moved {
  from = aws_instance.old_name
  to   = aws_instance.new_name
}

When Atlantis processes a PR with a moved block, atlantis plan shows the intended state move, and atlantis apply executes it. This is safer, version-controlled, and aligns with GitOps.

3.2. Removing Resources from State: atlantis state rm and HCL removed Blocks

Atlantis can execute terraform state rm via PR comments: atlantis state [options] rm ADDRESS – [terraform state rm flags]

Example: atlantis state -p myproject rm 'aws_instance.to_unmanage["foo"]'

Configure by enabling state in --allow-commands on the Atlantis server.

Optionally, define a custom workflow stage for state_rm in atlantis.yaml or repos.yaml for added control:

# atlantis.yaml example
workflows:
  custom_state_removal:
    state_rm:
      steps:
        - init
        - run: echo "User $USER_NAME is attempting to remove $COMMENT_ARGS from state in project $PROJECT_NAME"
        - state_rm # Executes terraform state rm with comment args
        - run: echo "Resource $COMMENT_ARGS removed from state. RUN ATLANTIS PLAN NEXT!"

Declarative Alternative (Terraform 1.7+): HCL removed Blocks For removing resources from state without destroying them, Terraform 1.7+ offers HCL removed blocks:

removed {
  from = aws_instance.old_resource
  lifecycle {
    destroy = false // Ensures the actual resource is not destroyed
  }
}

This declarative method is safer and integrates into the standard plan/apply cycle managed by Atlantis.

3.3. Key Risks and Safeguards for State Operations

  • Risks: State corruption, unintended resource recreation (if a resource is removed from state and HCL isn't updated, plan will want to create it), security exposure, bypassing review.
  • Safeguards:
    1. Favor Declarative: Always prefer HCL moved and removed blocks.
    2. Strict Controls for atlantis state rm: Require explicit justification, CODEOWNERS approval.
    3. Mandatory Post-state rm Plan: Always run atlantis plan immediately after atlantis state rm to understand the impact.
    4. Minimize Use: Reserve atlantis state rm for true exceptions.
    5. Custom Workflows as Gatekeepers: Add validation or notification steps.
    6. RBAC & Permissions: Limit who can issue these commands.
    7. State Backups: Essential.
    8. Training: Ensure users understand the implications.

Direct state manipulation is powerful but dangerous. The GitOps principles of review and declarative intent are paramount. For organizations needing stringent control over who can perform such operations and under what conditions, a platform like Scalr can offer more granular RBAC and policy-based restrictions, potentially flagging or blocking direct state commands that don't adhere to predefined organizational policies.

4. Customizing Workflows: validate, show, refresh, and More

Atlantis's custom workflows and hooks allow execution of arbitrary Terraform CLI commands.

4.1. Leveraging Pre/Post Workflow Hooks and Custom run Steps

  • Pre-Workflow Hooks: Scripts run before Atlantis commands (e.g., for dynamic atlantis.yaml generation). Output not in PR by default.
  • Post-Workflow Hooks: Scripts run after Atlantis commands (e.g., for notifications, cost reports). Output not in PR by default.
  • Custom run Steps: Arbitrary shell commands within workflow stages (plan, apply). Output can be shown in PR comments. This is ideal for validation or checks whose results need to be seen by the PR author.

4.2. Integrating terraform validate

Catch syntax errors early by adding terraform validate as a run step before plan:

# atlantis.yaml or repos.yaml
workflows:
  validated_plan:
    plan:
      steps:
        - init
        - run: terraform validate -no-color # Fails workflow if validation fails
        - plan

4.3. Utilizing terraform show

To get plan output in JSON for programmatic analysis, use a post-workflow hook:

# repos.yaml
post_workflow_hooks:
  - run: |
      terraform show -json $PLANFILE > /tmp/plan_output.json
      # Further processing, e.g., upload to an analysis service
    commands: [plan] # Target only plan commands
    description: "Generate and process JSON plan output"

4.4. Handling terraform refresh (or -refresh-only plans)

To reconcile state with actual resources (terraform plan -refresh-only is preferred over the deprecated terraform refresh):

# atlantis.yaml or repos.yaml
workflows:
  refresh_and_plan:
    plan:
      steps:
        - init
        - plan:
            extra_args: ["-refresh-only"] # Generates a refresh-only plan
        # Potentially an apply step here if state needs updating from refresh,
        # but 'apply -refresh-only -auto-approve' is risky.
        - plan # Regular plan against potentially updated state

Caution: Auto-applying a refresh can be dangerous. Reviewing a plan -refresh-only first is safer.

4.5. Securely Handling COMMENT_ARGS

User-supplied arguments in comments (atlantis plan -- -target=foo) are passed via the $COMMENT_ARGS environment variable. This is a security risk if not handled carefully in custom scripts (command injection).

  • Always treat $COMMENT_ARGS as untrusted input.
  • Parse, sanitize, and validate arguments against an allowlist.
  • Avoid direct execution (e.g., eval "$COMMENT_ARGS").

4.6. Practical Use Cases: Security Scanning, Cost Estimation, DR

  • Security Scanning (e.g., Terrascan, Checkov): Add as a run step in the plan workflow to scan HCL or plan files. Fail the workflow on critical violations.
  • Cost Estimation (e.g., Infracost): Use a post_workflow_hook after plan to generate a cost breakdown and post it to the PR (requires API calls to VCS) or a Slack channel.
  • Dynamic atlantis.yaml Generation: Use a pre_workflow_hook with tools like terragrunt-atlantis-config.
  • Specific Disaster Recovery (DR) Actions: Define highly restricted custom workflows for DR. Trigger via atlantis apply -w dr_workflow. Use with extreme caution, strong approvals, and thorough testing.

While Atlantis provides the flexibility for these integrations, managing the tooling, script maintenance, and consistent application of these checks across a large number of repositories can become a significant operational overhead. Platforms like Scalr often provide these integrations (e.g., OPA for policy, cost estimation, security scanning) as built-in features, managed centrally, which can simplify adoption and ensure uniformity.

5. Essential Logging and Auditing for Advanced Commands

Traceability is key for advanced or risky operations.

5.1. Atlantis Server-Side and VCS Auditing

  • Atlantis Server Logs: Configure log level (e.g., info) and persist stdout/stderr to a centralized logging system (ELK, Splunk, etc.). These logs contain operational details, command execution, and errors.
  • VCS Pull Request: The PR itself is a crucial audit trail: user comments invoking commands, approvals, plan summaries, and discussions.

5.2. Enhancing Audits with Custom Hooks

For detailed auditing of sensitive commands, use workflow hooks to send structured logs to a SIEM or logging service.

Example post-workflow hook script for custom audit logging:

#!/bin/sh
# post_advanced_cmd_audit.sh

# Ensure AUDIT_LOG_ENDPOINT and AUDIT_LOG_API_KEY are set in Atlantis server env

if [ -z "$AUDIT_LOG_ENDPOINT" ] || [ -z "$AUDIT_LOG_API_KEY" ]; then
  echo "Audit logging endpoint or API key not configured. Skipping custom audit log." >&2
  exit 0
fi

COMMAND_EXECUTION_STATUS="success"
if [ "$COMMAND_HAS_ERRORS" = "true" ]; then
  COMMAND_EXECUTION_STATUS="failure"
fi

LOG_PAYLOAD=$(cat <<EOF
{
  "timestamp": "$(date -u +"%Y-%m-%dT%H:%M:%SZ")",
  "event_source": "atlantis_workflow_hook",
  "atlantis_command_name": "$COMMAND_NAME",
  "vcs_user": "$USER_NAME",
  "pull_request_number": "$PULL_NUM",
  "repository": "$BASE_REPO_OWNER/$BASE_REPO_NAME",
  "target_directory": "$REPO_REL_DIR", # Project specific directory
  "project_name": "$PROJECT_NAME",
  "workspace": "$WORKSPACE",
  "comment_args_raw": "$COMMENT_ARGS",
  "atlantis_command_status": "$COMMAND_EXECUTION_STATUS"
}
EOF
)

curl -s -X POST \
  -H "Content-Type: application/json" \
  -H "Authorization: Bearer $AUDIT_LOG_API_KEY" \
  --data "$LOG_PAYLOAD" \
  "$AUDIT_LOG_ENDPOINT"

if [ $? -ne 0 ]; then
  echo "Failed to send audit log to endpoint." >&2
fi

Also, enable versioning and access logging on your Terraform state backend (e.g., S3).

Comprehensive, correlated, and secure audit logs are vital. Building and maintaining such a system around Atlantis can be complex. Centralized IaC platforms like Scalr often provide robust, out-of-the-box audit capabilities that simplify compliance and security monitoring.

6. Summary Table: Advanced Atlantis Operations

Operation/Feature

Atlantis Mechanism

Key Risks

Recommended Safeguards / Alternatives

Resource Import

atlantis import command; HCL import block (TF 1.5+)

Incorrect ID/address, config mismatch, security.

Strict import_requirements, post-import plan, prefer HCL import blocks.

State Move (mv)

(No direct command)

State corruption if custom script is flawed.

Use HCL moved blocks (TF 1.1+). Avoid custom scripts for state mv.

State Remove (rm)

atlantis state rm command; HCL removed block (TF 1.7+)

Unintended resource recreation, state corruption, bypassing review.

Strict approvals, mandatory post-rm plan, prefer HCL removed blocks.

terraform validate

Custom run step in workflow

Minimal if only validating.

Integrate into plan stage before terraform plan.

terraform show

Custom run step or post-workflow hook

Exposing sensitive plan data if output not handled securely.

Use for analysis; secure output if persisted.

terraform refresh

plan/apply -refresh-only via extra_args

Auto-applying refresh is risky; misconfiguration can lead to errors.

Review plan -refresh-only output carefully before any apply.

Custom Commands/Scripts

run steps, pre/post workflow hooks

Command injection via $COMMENT_ARGS, script errors, security holes.

Securely parse/sanitize $COMMENT_ARGS, thorough script testing, RBAC.

7. Conclusion: Mastering Advanced Terraform Ops – Atlantis and Strategic Considerations

Atlantis provides a flexible foundation for automating a wide range of Terraform operations within a GitOps framework. Moving beyond basic plans and applies to imports, state management, and custom validations can significantly streamline infrastructure lifecycle management.

However, as these advanced capabilities are unlocked, the onus of maintaining security, governance, and operational consistency grows. Securely handling user inputs, managing custom script lifecycles, and ensuring comprehensive auditing require diligent effort and expertise.

For organizations finding that the operational overhead of managing these advanced scenarios in Atlantis is becoming substantial, or those requiring more sophisticated, centralized governance, policy enforcement (e.g., with Open Policy Agent), and enterprise-grade RBAC, exploring dedicated Terraform automation and collaboration platforms like Scalr can be a strategic next step. Such platforms often build upon the GitOps principles championed by tools like Atlantis but add layers of control, visibility, and efficiency designed for complex, multi-team environments.

By understanding both the power and the responsibilities that come with advanced Atlantis usage, teams can make informed decisions about how to best scale their Terraform operations securely and effectively.