Terraform Atlantis Best Practices: 10 Essential Tips for Production Use

Discover 10 practical tips for using Terraform Atlantis: streamline CI/CD workflows, enforce security, improve collaboration, and automate infrastructure changes.

Version control your Atlantis configuration for consistency and auditability

Storing your Atlantis configuration in version control ensures consistency across environments and provides an audit trail for configuration changes. This practice is fundamental to infrastructure as code principles.

Implement this by creating an atlantis.yaml file at the root of your repository:

version: 3
automerge: false
parallel_plan: true
parallel_apply: true
projects:
  - name: networking
    dir: infrastructure/networking
    autoplan:
      when_modified: ["*.tf", "*.tfvars", "../modules/network/**/*.tf"]
    terraform_version: 1.5.0
    execution_order_group: 1
  
  - name: database
    dir: infrastructure/database
    autoplan:
      when_modified: ["*.tf", "*.tfvars", "../modules/database/**/*.tf"]
    terraform_version: 1.5.0
    execution_order_group: 2
    depends_on:
      - networking

For advanced configurations, consider:

  • Using server-side repos.yaml for organization-wide configurations
  • Implementing pre-commit hooks to validate atlantis.yaml before pushing
  • Documenting configuration changes with detailed commit messages

By version controlling your Atlantis configuration, you gain the ability to review configuration changes through the same pull request process you use for infrastructure changes, ensuring proper oversight and validation.

Implement robust state locking and backend configuration

Proper state management is critical for preventing concurrent modifications and ensuring consistent infrastructure deployments. Atlantis requires remote state backends to function effectively in team environments.

Configure your backend with appropriate locking mechanisms:

# For AWS
terraform {
  backend "s3" {
    bucket         = "terraform-state-bucket"
    key            = "path/to/my/key"
    region         = "us-east-1"
    encrypt        = true
    dynamodb_table = "terraform-lock-table"  # For state locking
  }
}

# For Azure
terraform {
  backend "azurerm" {
    resource_group_name  = "terraform-rg"
    storage_account_name = "terraformsa"
    container_name       = "terraformstate"
    key                  = "terraform.tfstate"
    # Azure handles locking automatically through blob leasing
  }
}

# For GCP
terraform {
  backend "gcs" {
    bucket  = "terraform-state-bucket"
    prefix  = "terraform/state"
    # GCS has built-in object locking
  }
}

For complex backend configurations, use custom workflows in atlantis.yaml:

workflows:
  custom_backend:
    plan:
      steps:
      - run: rm -rf .terraform
      - init:
          extra_args: [
            "-backend-config=bucket=terraform-state-bucket",
            "-backend-config=key=${WORKSPACE}/state.tfstate", 
            "-backend-config=region=us-east-1",
            "-backend-config=encrypt=true",
            "-backend-config=dynamodb_table=terraform-lock-table"
          ]
      - plan

Atlantis also implements its own project-level locking to prevent conflicts between pull requests. When locks conflict:

  • Check the Atlantis dashboard to view current locks
  • Use atlantis unlock in PR comments to release locks when necessary
  • Set --disable-lock-timeout flag to configure lock duration

Following these state management practices ensures reliable and conflict-free infrastructure deployments across your team.

Use dedicated least-privilege IAM roles for Atlantis operations

Atlantis requires permissions to manage your infrastructure resources, making proper IAM configuration critical for security. Following the principle of least privilege reduces your security risk surface.

For AWS, create dedicated IAM roles with minimal permissions:

# Example Terraform code to create an Atlantis IAM role
resource "aws_iam_role" "atlantis" {
  name = "atlantis-terraform-role"
  
  assume_role_policy = jsonencode({
    Version = "2012-10-17"
    Statement = [
      {
        Action = "sts:AssumeRole"
        Effect = "Allow"
        Principal = {
          Service = "ec2.amazonaws.com"  # Or your specific service
        }
      }
    ]
  })
}

# Attach only required permissions
resource "aws_iam_role_policy_attachment" "atlantis_policy" {
  role       = aws_iam_role.atlantis.name
  policy_arn = "arn:aws:iam::aws:policy/specific-policy"  # Use specific policies, not AdminAccess
}

In your Terraform configurations, use assume role patterns:

provider "aws" {
  assume_role {
    role_arn     = "arn:aws:iam::ACCOUNT_ID:role/AtlantisRole"
    session_name = "${var.atlantis_user}-${var.atlantis_repo_owner}-${var.atlantis_repo_name}"
  }
}

For environment-specific roles, use custom workflows:

workflows:
  dev-workflow:
    plan:
      steps:
      - env:
          name: AWS_ROLE_ARN
          value: arn:aws:iam::ACCOUNT_ID:role/DevRole
      - init
      - plan
  
  prod-workflow:
    plan:
      steps:
      - env:
          name: AWS_ROLE_ARN
          value: arn:aws:iam::ACCOUNT_ID:role/ProdRole
      - init
      - plan

Additional security measures:

  • Use short session durations for assumed roles
  • Include explicit deny statements for destructive actions
  • Regularly audit IAM roles and permissions
  • Consider creating separate roles for read, plan, and apply operations

By implementing least-privilege IAM roles, you significantly reduce the risk of unauthorized or accidental infrastructure modifications.

Keep Atlantis and Terraform versions updated with a controlled upgrade strategy

Keeping Atlantis and Terraform versions current ensures you benefit from security patches, bug fixes, and new features. However, updates require careful planning to avoid disruption.

For Atlantis version management:

Build custom images for specific requirements:

FROM ghcr.io/runatlantis/atlantis:v0.34.0
USER root
RUN apk add --update python3 py3-pip && \
    pip install ansible
USER atlantis

Deploy Atlantis using containerization for easy version control:

docker pull ghcr.io/runatlantis/atlantis:v0.34.0

For Terraform version management:

Choose between Terraform and OpenTofu distributions:

projects:
  - name: project
    dir: project
    terraform_distribution: opentofu
    terraform_version: 1.6.0

Override for specific projects in atlantis.yaml:

projects:
  - name: legacy
    dir: legacy
    terraform_version: 0.14.11
  - name: modern
    dir: modern
    terraform_version: 1.5.0

Set default Terraform version:

atlantis server --default-tf-version=1.5.0

Implement a controlled upgrade strategy:

  1. Maintain a test environment for evaluating new versions
  2. Document a compatibility matrix of tested versions
  3. Schedule upgrades during low-activity periods
  4. Communicate changes to all team members
  5. Create a rollback plan for each upgrade

By maintaining current versions while following a controlled upgrade process, you ensure stability while benefiting from the latest features and security improvements.

Structure repositories for efficient planning and reduced conflicts

Proper repository structure significantly impacts Atlantis's performance and usability. Well-organized repositories reduce unnecessary plans and minimize conflicts between team members.

For monorepos (all infrastructure in one repository):

terraform-repo/
├── atlantis.yaml
├── modules/
│   ├── networking/
│   ├── compute/
│   └── storage/
├── environments/
│   ├── dev/
│   ├── staging/
│   └── production/
└── README.md

For multi-repo approaches (separate repositories for different components):

networking-repo/
├── atlantis.yaml
├── modules/
└── environments/

compute-repo/
├── atlantis.yaml
├── modules/
└── environments/

Configure Atlantis for your repository structure:

# For monorepo with environment directories
version: 3
projects:
  - name: network-dev
    dir: environments/dev/network
  - name: compute-dev
    dir: environments/dev/compute
    depends_on: [network-dev]
  - name: network-prod
    dir: environments/prod/network
  - name: compute-prod
    dir: environments/prod/compute
    depends_on: [network-prod]

For large repositories with many Terraform projects, use autodiscovery:

version: 3
autodiscover:
  mode: enabled
  ignore_paths:
    - "*.md"
    - "docs/**/*"
    - "legacy/**/*"

Best practices for repository structure:

  • Group related resources that change together
  • Use modules for reusable components
  • Separate environments to limit blast radius
  • Establish consistent directory naming conventions
  • Document structure in README files
  • Use execution order groups and dependencies to manage relationships

Well-structured repositories enhance collaboration, reduce unnecessary operations, and minimize state conflicts, leading to more efficient infrastructure management.

Optimize when_modified patterns in atlantis.yaml for targeted planning

The when_modified setting determines which file changes trigger Atlantis to run terraform plan. Optimizing these patterns reduces unnecessary plans, improving performance and developer experience.

Instead of using overly broad patterns:

# Too broad - will trigger plans for any .tf file change anywhere
when_modified: ["**/*.tf"]

Use targeted patterns that match your repository structure:

projects:
  - name: networking
    dir: networking
    autoplan:
      when_modified:
        - "networking/*.tf"                # Direct files
        - "networking/*.tfvars"            # Variables files
        - "networking/.terraform.lock.hcl" # Dependency lock file
        - "modules/network/**/*.tf"        # Related modules
  
  - name: database
    dir: database
    autoplan:
      when_modified:
        - "database/*.tf"
        - "database/*.tfvars"
        - "database/.terraform.lock.hcl"
        - "modules/database/**/*.tf"

For projects that share modules:

projects:
  - name: project1
    dir: project1
    autoplan:
      when_modified:
        - "project1/**/*.tf"
        - "modules/shared/**/*.tf"  # Shared modules

  - name: project2
    dir: project2
    autoplan:
      when_modified:
        - "project2/**/*.tf"
        - "modules/shared/**/*.tf"  # Same shared modules

For workspace-based projects:

projects:
  - name: app-dev
    dir: app
    workspace: dev
    autoplan:
      when_modified:
        - "app/*.tf"
        - "app/env/dev.tfvars"
  
  - name: app-prod
    dir: app
    workspace: prod
    autoplan:
      when_modified:
        - "app/*.tf"
        - "app/env/prod.tfvars"

Implementation tips:

  • Paths are relative to the project's directory
  • Include parent directory references when needed (e.g., ../modules/**/*.tf)
  • Test patterns with sample PRs before finalizing
  • Review Atlantis logs to identify unnecessary plan operations
  • Update patterns as your repository structure evolves

Optimized when_modified patterns significantly reduce CI load, decrease PR noise, and speed up the development workflow.

Implement pre-plan validation and security scanning

Integrating validation and security scanning before Terraform operations helps catch issues early, improving code quality and security posture.

Configure pre-workflow hooks in atlantis.yaml:

version: 3
projects:
  - name: example
    dir: example
    workflow: validate-and-plan
    apply_requirements: [approved]

workflows:
  validate-and-plan:
    plan:
      steps:
        - run: terraform fmt -check
        - run: terraform validate
        - run: tfsec --no-color .
        - run: checkov -d . --quiet
        - init
        - plan

For organization-wide scanning, use the server-side repos.yaml:

repos:
  - id: /.*/
    pre_workflow_hooks:
      - run: terraform fmt -check
      - run: tflint
      - run: tfsec . --no-color

Integrate with policy-as-code tools:

repos:
  - id: /.*/
    policy_check: true
policies:
  policy_sets:
    - name: security-policies
      path: /path/to/policies
      source: local
workflows:
  custom:
    plan:
      steps:
      - init
      - plan
    policy_check:
      steps:
      - policy_check:
          extra_args: ["--all-namespaces"]

Example OPA/Conftest policy (save as policy/terraform.rego):

package terraform

deny[msg] {
  input.resource.aws_s3_bucket[name].acl == "public-read"
  msg = sprintf("S3 bucket '%v' is publicly readable", [name])
}

deny[msg] {
  input.resource.aws_security_group_rule[name].cidr_blocks[_] == "0.0.0.0/0"
  input.resource.aws_security_group_rule[name].type == "ingress"
  port = input.resource.aws_security_group_rule[name].to_port
  msg = sprintf("Security group rule '%v' allows ingress from internet to port %v", [name, port])
}

Recommended validation and scanning tools:

  • terraform validate: Built-in syntax and configuration check
  • terraform fmt: Code formatting verification
  • tflint: Extended linting beyond Terraform's basic validation
  • tfsec: Security vulnerability scanner for Terraform
  • checkov: Policy-based security scanner
  • conftest/OPA: Custom policy enforcement
  • terrascan: Compliance and security violation scanner

By implementing pre-plan validation and security scanning, you catch issues before they reach infrastructure, significantly reducing security risks and improving code quality.

Monitor Atlantis server health and logs for operational visibility

Comprehensive monitoring ensures Atlantis operates reliably and provides visibility into infrastructure operations.

Configure Atlantis for Prometheus metrics:

# Start Atlantis with metrics enabled
atlantis server --metrics-prometheus-endpoint="/metrics"

Key metrics to monitor:

  • atlantis_cmd_autoplan_builder_execution_success/error: Success/error counts for autoplans
  • atlantis_project_plan_execution_success/error: Success/error counts for project plans
  • atlantis_project_apply_execution_success/error: Success/error counts for applies
  • atlantis_project_plan_execution_time/apply_execution_time: Execution times

Create a Prometheus configuration:

# prometheus.yml
scrape_configs:
  - job_name: 'atlantis'
    scrape_interval: 15s
    static_configs:
      - targets: ['atlantis:4141']

Set up appropriate logging:

# Configure log level
atlantis server --log-level=info

For Kubernetes deployments, configure health probes:

# Kubernetes deployment
livenessProbe:
  httpGet:
    path: /healthz
    port: 4141
  initialDelaySeconds: 30
  periodSeconds: 30
readinessProbe:
  httpGet:
    path: /healthz
    port: 4141
  initialDelaySeconds: 30
  periodSeconds: 30

Create a Grafana dashboard visualizing:

  • Command execution success/failure rates
  • Execution times
  • Project plan/apply success rates
  • Lock statistics
  • Server resource utilization

Configure alerts for:

  • High error rates
  • Unusually long execution times
  • Server resource constraints
  • Lock contention

Forward logs to a centralized system (ELK, CloudWatch, etc.) for analysis and retention:

# Filebeat configuration example
filebeat.inputs:
- type: log
  paths:
    - /var/log/atlantis/atlantis.log
output.elasticsearch:
  hosts: ["elasticsearch:9200"]

By implementing comprehensive monitoring, you gain visibility into Atlantis operations, enable proactive issue detection, and create an audit trail of infrastructure changes.

Secure webhooks and the Atlantis endpoint

Properly securing Atlantis communication channels is essential for protecting infrastructure operations from unauthorized access.

Use webhook secrets to validate requests:

# Start Atlantis with webhook secret
atlantis server --gh-webhook-secret="your-secure-webhook-secret"

# Or use environment variables (recommended)
export ATLANTIS_GH_WEBHOOK_SECRET="your-secure-webhook-secret"

Configure webhook in GitHub:

  • URL: https://atlantis.example.com/events
  • Content Type: application/json
  • Secret: your-secure-webhook-secret
  • Events: Pull request, Push, Issue comment

Enable HTTPS for all communications:

# Start Atlantis with TLS
atlantis server \
  --ssl-cert-file=/path/to/cert.pem \
  --ssl-key-file=/path/to/key.pem

Implement authentication for the Atlantis web interface:

# Enable basic authentication
atlantis server \
  --web-basic-auth=true \
  --web-username=admin \
  --web-password=secure-password

Restrict repository access:

# Limit which repositories Atlantis responds to
atlantis server --repo-allowlist="github.com/yourorg/*"

# Exclude specific repositories
atlantis server --repo-allowlist="github.com/yourorg/*,!github.com/yourorg/sensitive-repo"

Deploy Atlantis behind a reverse proxy for additional security:

# Nginx configuration example
server {
    listen 443 ssl;
    server_name atlantis.example.com;

    ssl_certificate /path/to/cert.pem;
    ssl_certificate_key /path/to/key.pem;

    # Security headers
    add_header Strict-Transport-Security "max-age=31536000; includeSubDomains" always;
    add_header X-Content-Type-Options "nosniff" always;
    add_header X-Frame-Options "DENY" always;
    add_header Content-Security-Policy "default-src 'self'" always;

    location / {
        proxy_pass http://localhost:4141;
        proxy_set_header Host $host;
        proxy_set_header X-Real-IP $remote_addr;
        proxy_set_header X-Forwarded-For $proxy_add_x_forwarded_for;
        proxy_set_header X-Forwarded-Proto $scheme;
    }
}

For additional protection:

  • Deploy in a private network with restricted access
  • Use IP allow-listing to restrict connections to known sources
  • Implement a Web Application Firewall (WAF) for additional protection
  • Regularly rotate webhook secrets and credentials
  • Monitor for unauthorized access attempts

By properly securing webhooks and endpoints, you prevent unauthorized access to your infrastructure automation system, reducing the risk of malicious modifications.

Effectively train teams on Atlantis usage and establish clear workflows

Successful Atlantis adoption requires proper team training and clear workflow documentation. This ensures consistent usage patterns and maximizes the benefits of automation.

Develop a staged rollout strategy:

  1. Pilot Phase:
    • Start with a small, low-risk project
    • Include technically proficient team members
    • Run Atlantis alongside existing processes
  2. Controlled Expansion:
    • Gradually include more projects
    • Introduce broader team to the workflow
    • Maintain fallback procedures
  3. Full Adoption:
    • Make Atlantis the standard workflow
    • Retire legacy processes
    • Continuously refine and optimize

Create comprehensive documentation:

  • Basic Atlantis commands and workflow
  • Repository-specific configurations
  • Troubleshooting guides
  • FAQ for common issues
  • Team-specific procedures

Example documentation template:

# Atlantis Workflow Guide

## Basic Commands
- `atlantis plan` - Run plan on affected projects
- `atlantis apply` - Apply planned changes
- `atlantis plan -d dir` - Plan specific directory
- `atlantis apply -p project` - Apply specific project
- `atlantis unlock` - Release locks

## Project-Specific Workflows
1. Network Infrastructure
   - Dependencies: None
   - Approval Requirements: [Team Lead]
   - Special Considerations: [Details]

2. Database Infrastructure
   - Dependencies: Network Infrastructure
   - Approval Requirements: [DBA, Team Lead]
   - Special Considerations: [Details]

## Troubleshooting
1. Lock Issues
   - Check Atlantis dashboard for current locks
   - Use `atlantis unlock` if necessary
   - Contact [Support Person] for assistance

2. Plan/Apply Failures
   - [Common resolution steps]
   - Escalation path: [Details]

Conduct practical training sessions:

  • Hands-on workshops with real examples
  • Role-specific training (developer vs. approver)
  • Record sessions for future reference

Create a mentoring system:

  • Assign Atlantis champions to assist teams
  • Establish support channels (Slack, etc.)
  • Pair experienced users with newcomers

Track adoption metrics:

  • Number of PRs processed through Atlantis
  • Volume of plan/apply operations
  • Time from PR creation to merge
  • Reduction in failed applies
  • Decrease in infrastructure incidents

By investing in comprehensive training and establishing clear workflows, you accelerate adoption, reduce errors, and ensure consistent infrastructure management practices across your organization.

Leverage execution order groups and dependencies for complex infrastructures

For complex infrastructures with interdependencies, properly configuring execution order ensures consistent and reliable deployments.

Configure execution order in atlantis.yaml:

version: 3
parallel_plan: true
parallel_apply: true
projects:
  - name: network
    dir: infrastructure/network
    execution_order_group: 1
  
  - name: security
    dir: infrastructure/security
    execution_order_group: 2
    depends_on: 
      - network
  
  - name: database
    dir: infrastructure/database
    execution_order_group: 3
    depends_on: 
      - security
  
  - name: application
    dir: infrastructure/application
    execution_order_group: 4
    depends_on: 
      - database

Key configuration elements:

  • execution_order_group: Numeric value determining execution priority (lower numbers execute first)
  • depends_on: List of projects that must complete before this project runs
  • parallel_plan and parallel_apply: Enable parallel execution where dependencies allow

For more complex patterns with environment separation:

version: 3
projects:
  # Development environment
  - name: network-dev
    dir: environments/dev/network
    execution_order_group: 1
  
  - name: compute-dev
    dir: environments/dev/compute
    execution_order_group: 2
    depends_on: 
      - network-dev
  
  # Production environment
  - name: network-prod
    dir: environments/prod/network
    execution_order_group: 1
  
  - name: compute-prod
    dir: environments/prod/compute
    execution_order_group: 2
    depends_on: 
      - network-prod

For projects using data from other projects:

version: 3
projects:
  - name: shared-resources
    dir: shared
    execution_order_group: 1
  
  - name: application
    dir: application
    execution_order_group: 2
    depends_on: 
      - shared-resources
    workflow: remote-state

With a custom workflow for remote state access:

workflows:
  remote-state:
    plan:
      steps:
        - run: terraform init -backend-config=path_to_state.config
        - run: terraform workspace select ${WORKSPACE}
        - run: terraform plan -out $PLANFILE

Benefits of proper execution ordering:

  • Ensures resources are created in the correct sequence
  • Prevents dependency errors during apply operations
  • Enables maximum parallelism where dependencies allow
  • Provides clear visualization of infrastructure relationships
  • Simplifies complex deployment processes

By leveraging execution order groups and dependencies, you can manage complex infrastructure deployments reliably while maintaining optimal performance through appropriate parallelization.