A Practical Guide to Terraform Operations with Atlantis:
Discover how Atlantis automates PR-based Terraform workflows—secure plans, enforce policy, and scale infrastructure changes for faster, safer delivery.
1. Introduction: Beyond Basic Plan and Apply
Terraform Atlantis has become a cornerstone for many teams adopting GitOps for infrastructure as code. By integrating terraform plan
and terraform apply
directly into pull request (PR) workflows, it enhances collaboration and provides a clear audit trail. However, the lifecycle of infrastructure management often demands more than these fundamental operations.
This post explores advanced use cases for Atlantis, exploring how to manage resource imports, perform state manipulations, and customize workflows for other Terraform CLI commands. While Atlantis provides the building blocks for these operations, organizations scaling their IaC practices may find that ensuring consistent governance, security, and operational efficiency across numerous teams and projects can introduce complexity. This is where understanding the limits and potential extensions or alternatives becomes crucial.
2. Importing Existing Infrastructure: atlantis import
and Declarative Alternatives
Bringing existing, manually-created infrastructure under Terraform management is a common requirement. Atlantis offers mechanisms to integrate this into your GitOps flow.
2.1. Using the atlantis import
Command
Atlantis allows users to trigger terraform import
via a PR comment:
atlantis import [options] ADDRESS ID -- [terraform import flags]
For example: atlantis import -d prod/networking aws_vpc.main_vpc vpc-12345678
To enable and control this, you'll configure import_requirements
in your Atlantis server-side configuration (repos.yaml
or server-side-repo-config
) or per-repository atlantis.yaml
:
# repos.yaml example
repos:
- id: /.*/ # Applies to all repositories
import_requirements: [approved, mergeable] # Requires PR approval and mergeability
These requirements (e.g., approved
, mergeable
) ensure imports receive similar scrutiny to apply operations.
2.2. Security Considerations for atlantis import
Enabling atlantis import
requires careful security considerations:
- Strict
import_requirements
: Essential first line of defense. - Webhook Security: Secure webhooks with secrets and HTTPS.
- Atlantis Server Authentication: Protect the Atlantis UI/API.
- Secure Terraform Provider Credentials: Use instance roles or secrets management; avoid hardcoding.
- RBAC: Leverage VCS features like CODEOWNERS and restrict comment permissions.
- Malicious Code Risk: Remember that an import often leads to a plan. Malicious HCL in a PR could still pose a risk during the subsequent plan phase.
- State File Security: Ensure your Terraform state backend is secure and encrypted.
2.3. Recommended Workflow for atlantis import
- Define Resource in HCL: Write the Terraform configuration for the resource to be imported.
- Create Pull Request: Submit the HCL changes.
- Review HCL: Scrutinize the configuration.
- Comment
atlantis import
: Once approved, an authorized user issues the command. - Run
atlantis plan
Post-Import: Crucial step. This validates the import and shows discrepancies between your HCL and the actual resource state. - Adjust HCL: Modify the HCL based on the plan output until it accurately reflects the imported resource.
- Final Approval and Apply: Once the plan is clean, apply the changes.
2.4. The Rise of HCL import
Blocks (Terraform 1.5+)
Terraform v1.5.0 introduced config-driven imports using import
blocks directly in HCL, offering a more declarative, GitOps-native approach:
resource "aws_instance" "example" {
# Configuration for the instance...
}
import {
to = aws_instance.example
id = "i-0ecd5e8ed288048d9" // The existing instance ID
}
With this method, the import operation becomes part of the standard atlantis plan
and atlantis apply
cycle. Atlantis handles this naturally.
atlantis import
Command vs. HCL import
Block:
Feature |
| HCL |
---|---|---|
Invocation | Imperative (PR comment) | Declarative (in |
Workflow | Separate | Integrated into standard |
Auditability | PR comment trail; HCL for resource config. | Intent fully captured in Git history (HCL). |
Primary Use Cases | Ad-hoc imports, pre-1.5 Terraform. | Preferred for GitOps, ongoing management. |
While HCL import
blocks are generally preferred for their declarative nature, the atlantis import
command remains useful for older Terraform versions or specific ad-hoc scenarios. Managing these workflows consistently, especially ensuring that the post-import plan
and HCL adjustments are diligently performed, can become challenging at scale. Platforms offering more structured workflows or policy enforcement around resource onboarding, like Scalr, can provide additional guardrails and visibility here.
3. Navigating Terraform State Manipulation: Risks and Best Practices
Direct state manipulation (terraform state mv
, terraform state rm
) is sometimes necessary but carries significant risks.
3.1. Moving Resources: Prefer HCL moved
Blocks
Atlantis doesn't offer a dedicated atlantis state mv
command. While custom workflows could theoretically execute terraform state mv
, this is complex and risky.
The highly recommended approach (Terraform 1.1+) is using declarative HCL moved
blocks:
moved {
from = aws_instance.old_name
to = aws_instance.new_name
}
When Atlantis processes a PR with a moved
block, atlantis plan
shows the intended state move, and atlantis apply
executes it. This is safer, version-controlled, and aligns with GitOps.
3.2. Removing Resources from State: atlantis state rm
and HCL removed
Blocks
Atlantis can execute terraform state rm
via PR comments: atlantis state [options] rm ADDRESS – [terraform state rm flags]
Example: atlantis state -p myproject rm 'aws_instance.to_unmanage["foo"]'
Configure by enabling state
in --allow-commands
on the Atlantis server.
Optionally, define a custom workflow stage for state_rm
in atlantis.yaml
or repos.yaml
for added control:
# atlantis.yaml example
workflows:
custom_state_removal:
state_rm:
steps:
- init
- run: echo "User $USER_NAME is attempting to remove $COMMENT_ARGS from state in project $PROJECT_NAME"
- state_rm # Executes terraform state rm with comment args
- run: echo "Resource $COMMENT_ARGS removed from state. RUN ATLANTIS PLAN NEXT!"
Declarative Alternative (Terraform 1.7+): HCL removed
Blocks For removing resources from state without destroying them, Terraform 1.7+ offers HCL removed
blocks:
removed {
from = aws_instance.old_resource
lifecycle {
destroy = false // Ensures the actual resource is not destroyed
}
}
This declarative method is safer and integrates into the standard plan/apply cycle managed by Atlantis.
3.3. Key Risks and Safeguards for State Operations
- Risks: State corruption, unintended resource recreation (if a resource is removed from state and HCL isn't updated, plan will want to create it), security exposure, bypassing review.
- Safeguards:
- Favor Declarative: Always prefer HCL
moved
andremoved
blocks. - Strict Controls for
atlantis state rm
: Require explicit justification, CODEOWNERS approval. - Mandatory Post-
state rm
Plan: Always runatlantis plan
immediately afteratlantis state rm
to understand the impact. - Minimize Use: Reserve
atlantis state rm
for true exceptions. - Custom Workflows as Gatekeepers: Add validation or notification steps.
- RBAC & Permissions: Limit who can issue these commands.
- State Backups: Essential.
- Training: Ensure users understand the implications.
- Favor Declarative: Always prefer HCL
Direct state manipulation is powerful but dangerous. The GitOps principles of review and declarative intent are paramount. For organizations needing stringent control over who can perform such operations and under what conditions, a platform like Scalr can offer more granular RBAC and policy-based restrictions, potentially flagging or blocking direct state commands that don't adhere to predefined organizational policies.
4. Customizing Workflows: validate
, show
, refresh
, and More
Atlantis's custom workflows and hooks allow execution of arbitrary Terraform CLI commands.
4.1. Leveraging Pre/Post Workflow Hooks and Custom run
Steps
- Pre-Workflow Hooks: Scripts run before Atlantis commands (e.g., for dynamic
atlantis.yaml
generation). Output not in PR by default. - Post-Workflow Hooks: Scripts run after Atlantis commands (e.g., for notifications, cost reports). Output not in PR by default.
- Custom
run
Steps: Arbitrary shell commands within workflow stages (plan
,apply
). Output can be shown in PR comments. This is ideal for validation or checks whose results need to be seen by the PR author.
4.2. Integrating terraform validate
Catch syntax errors early by adding terraform validate
as a run
step before plan
:
# atlantis.yaml or repos.yaml
workflows:
validated_plan:
plan:
steps:
- init
- run: terraform validate -no-color # Fails workflow if validation fails
- plan
4.3. Utilizing terraform show
To get plan output in JSON for programmatic analysis, use a post-workflow hook:
# repos.yaml
post_workflow_hooks:
- run: |
terraform show -json $PLANFILE > /tmp/plan_output.json
# Further processing, e.g., upload to an analysis service
commands: [plan] # Target only plan commands
description: "Generate and process JSON plan output"
4.4. Handling terraform refresh
(or -refresh-only
plans)
To reconcile state with actual resources (terraform plan -refresh-only
is preferred over the deprecated terraform refresh
):
# atlantis.yaml or repos.yaml
workflows:
refresh_and_plan:
plan:
steps:
- init
- plan:
extra_args: ["-refresh-only"] # Generates a refresh-only plan
# Potentially an apply step here if state needs updating from refresh,
# but 'apply -refresh-only -auto-approve' is risky.
- plan # Regular plan against potentially updated state
Caution: Auto-applying a refresh can be dangerous. Reviewing a plan -refresh-only
first is safer.
4.5. Securely Handling COMMENT_ARGS
User-supplied arguments in comments (atlantis plan -- -target=foo
) are passed via the $COMMENT_ARGS
environment variable. This is a security risk if not handled carefully in custom scripts (command injection).
- Always treat
$COMMENT_ARGS
as untrusted input. - Parse, sanitize, and validate arguments against an allowlist.
- Avoid direct execution (e.g.,
eval "$COMMENT_ARGS"
).
4.6. Practical Use Cases: Security Scanning, Cost Estimation, DR
- Security Scanning (e.g., Terrascan, Checkov): Add as a
run
step in theplan
workflow to scan HCL or plan files. Fail the workflow on critical violations. - Cost Estimation (e.g., Infracost): Use a
post_workflow_hook
afterplan
to generate a cost breakdown and post it to the PR (requires API calls to VCS) or a Slack channel. - Dynamic
atlantis.yaml
Generation: Use apre_workflow_hook
with tools liketerragrunt-atlantis-config
. - Specific Disaster Recovery (DR) Actions: Define highly restricted custom workflows for DR. Trigger via
atlantis apply -w dr_workflow
. Use with extreme caution, strong approvals, and thorough testing.
While Atlantis provides the flexibility for these integrations, managing the tooling, script maintenance, and consistent application of these checks across a large number of repositories can become a significant operational overhead. Platforms like Scalr often provide these integrations (e.g., OPA for policy, cost estimation, security scanning) as built-in features, managed centrally, which can simplify adoption and ensure uniformity.
5. Essential Logging and Auditing for Advanced Commands
Traceability is key for advanced or risky operations.
5.1. Atlantis Server-Side and VCS Auditing
- Atlantis Server Logs: Configure log level (e.g.,
info
) and persist stdout/stderr to a centralized logging system (ELK, Splunk, etc.). These logs contain operational details, command execution, and errors. - VCS Pull Request: The PR itself is a crucial audit trail: user comments invoking commands, approvals, plan summaries, and discussions.
5.2. Enhancing Audits with Custom Hooks
For detailed auditing of sensitive commands, use workflow hooks to send structured logs to a SIEM or logging service.
Example post-workflow hook script for custom audit logging:
#!/bin/sh
# post_advanced_cmd_audit.sh
# Ensure AUDIT_LOG_ENDPOINT and AUDIT_LOG_API_KEY are set in Atlantis server env
if [ -z "$AUDIT_LOG_ENDPOINT" ] || [ -z "$AUDIT_LOG_API_KEY" ]; then
echo "Audit logging endpoint or API key not configured. Skipping custom audit log." >&2
exit 0
fi
COMMAND_EXECUTION_STATUS="success"
if [ "$COMMAND_HAS_ERRORS" = "true" ]; then
COMMAND_EXECUTION_STATUS="failure"
fi
LOG_PAYLOAD=$(cat <<EOF
{
"timestamp": "$(date -u +"%Y-%m-%dT%H:%M:%SZ")",
"event_source": "atlantis_workflow_hook",
"atlantis_command_name": "$COMMAND_NAME",
"vcs_user": "$USER_NAME",
"pull_request_number": "$PULL_NUM",
"repository": "$BASE_REPO_OWNER/$BASE_REPO_NAME",
"target_directory": "$REPO_REL_DIR", # Project specific directory
"project_name": "$PROJECT_NAME",
"workspace": "$WORKSPACE",
"comment_args_raw": "$COMMENT_ARGS",
"atlantis_command_status": "$COMMAND_EXECUTION_STATUS"
}
EOF
)
curl -s -X POST \
-H "Content-Type: application/json" \
-H "Authorization: Bearer $AUDIT_LOG_API_KEY" \
--data "$LOG_PAYLOAD" \
"$AUDIT_LOG_ENDPOINT"
if [ $? -ne 0 ]; then
echo "Failed to send audit log to endpoint." >&2
fi
Also, enable versioning and access logging on your Terraform state backend (e.g., S3).
Comprehensive, correlated, and secure audit logs are vital. Building and maintaining such a system around Atlantis can be complex. Centralized IaC platforms like Scalr often provide robust, out-of-the-box audit capabilities that simplify compliance and security monitoring.
6. Summary Table: Advanced Atlantis Operations
Operation/Feature | Atlantis Mechanism | Key Risks | Recommended Safeguards / Alternatives |
---|---|---|---|
Resource Import |
| Incorrect ID/address, config mismatch, security. | Strict |
State Move ( | (No direct command) | State corruption if custom script is flawed. | Use HCL |
State Remove ( |
| Unintended resource recreation, state corruption, bypassing review. | Strict approvals, mandatory post- |
| Custom | Minimal if only validating. | Integrate into |
| Custom | Exposing sensitive plan data if output not handled securely. | Use for analysis; secure output if persisted. |
|
| Auto-applying refresh is risky; misconfiguration can lead to errors. | Review |
Custom Commands/Scripts |
| Command injection via | Securely parse/sanitize |
7. Conclusion: Mastering Advanced Terraform Ops – Atlantis and Strategic Considerations
Atlantis provides a flexible foundation for automating a wide range of Terraform operations within a GitOps framework. Moving beyond basic plans and applies to imports, state management, and custom validations can significantly streamline infrastructure lifecycle management.
However, as these advanced capabilities are unlocked, the onus of maintaining security, governance, and operational consistency grows. Securely handling user inputs, managing custom script lifecycles, and ensuring comprehensive auditing require diligent effort and expertise.
For organizations finding that the operational overhead of managing these advanced scenarios in Atlantis is becoming substantial, or those requiring more sophisticated, centralized governance, policy enforcement (e.g., with Open Policy Agent), and enterprise-grade RBAC, exploring dedicated Terraform automation and collaboration platforms like Scalr can be a strategic next step. Such platforms often build upon the GitOps principles championed by tools like Atlantis but add layers of control, visibility, and efficiency designed for complex, multi-team environments.
By understanding both the power and the responsibilities that come with advanced Atlantis usage, teams can make informed decisions about how to best scale their Terraform operations securely and effectively.