Custom Atlantis Workflows: Advanced Terraform Automation Guide 2025

Discover how custom Atlantis workflows unlock powerful Terraform automation—setup steps, best practices, and real-world use cases to streamline DevOps.

Ryan Fee

22 May 2025 • 7 min read

1. Introduction: Beyond Basic Atlantis

Terraform is the dominant dialect for Infrastructure as Code (IaC). Tools that make its application smoother are prized. Atlantis makes a mark by embedding Terraform execution directly within version control processes. Teams can plan and apply modifications via pull request commentary. This GitOps style boosts teamwork and audit trails.

Yet, as infrastructure’s web grows, so does the appetite for more refined automation. Here, Atlantis's atlantis.yaml configuration file enters the scene. It permits custom workflows able to manage complex deployment pipelines, assimilate third-party tools, and enforce particular operational doctrines. This grants enormous flexibility. It also ushers in a stratum of configuration management and operational load that teams must brace for. For entities aiming to administer Terraform at scale without plunging into YAML’s depths or overseeing the base automation engine, investigating platforms offering more structured, opinionated CI/CD for Terraform, like Scalr, might present a persuasive alternative with inherent governance and environment stewardship.

2. The Engine Room: Understanding `atlantis.yaml`

The atlantis.yaml file, positioned at your repository's root, is Atlantis customization's core. It tells Atlantis how to find projects, what commands to execute, and under what circumstances.

Core Structure and Top-Level Keys

Each atlantis.yaml document commences with version: 3. Principal top-level settings encompass:

projects: An array that specifies Terraform projects Atlantis should handle. Once set, Atlantis depends only on these, forsaking autodiscovery.
workflows: A map that outlines custom command sequences for planning and applying.
automerge: A Boolean to switch on automatic PR merging post-successful applies.
parallel_plan/parallel_apply: Booleans to execute operations concurrently for multiple projects.

Defining Projects for Granular Control

Every item in the projects array details a Terraform directory and its linked settings:

dir: Path to the project's location.
workspace: The Terraform workspace to employ (default is default).
name: A singular identifier, vital if several projects occupy the same dir (e.g., for varied workspaces).
workflow: Points to a custom workflow for this project.
autoplan: Sets up automatic planning, particularly with when_modified patterns.
apply_requirements: Conditions such as approved or mergeable before an apply is permitted.

version: 3
projects:
  - name: myapp-prod
    dir: terraform/myapp
    workspace: prod
    workflow: prod-workflow
    autoplan:
      when_modified:
        - "**/*.tf"
        - "prod.tfvars"
      enabled: true
    apply_requirements: [approved, mergeable, undiverged]

Crafting Custom Workflows: Plan, Apply, and Beyond

Custom workflows let you supersede default plan and apply actions with distinct stages and steps.

Stages: Generally plan and apply.
Steps: Can be internal Atlantis commands (init, plan, apply, show) or bespoke run commands.

The run step is notably potent, running arbitrary shell scripts and accessing environment variables like $PLANFILE, $WORKSPACE, $PROJECT_NAME, and $PULL_NUM.

workflows:
  prod-workflow:
    plan:
      steps:
        - init
        - run: echo "Running pre-production checks..."
        - plan:
            extra_args: ["-var-file=prod.tfvars", "-out=$PLANFILE"]
    apply:
      steps:
        - run: echo "Awaiting final sign-off for $PROJECT_NAME..."
        # Script for sign-off logic
        - run: ./scripts/await_manual_approval.sh $PULL_NUM
        - apply
        - run: echo "Production deployment of $PROJECT_NAME complete."

Overseeing the dependencies and execution milieu for these run scripts (making sure tools are installed in the Atlantis container, managing permissions) turns into a weighty thought as workflow complexity mounts.

3. Advanced Use Cases in Action

Let's see how these configurations facilitate advanced automation.

Multi-Environment Deployments (Dev, Staging, Prod)

A frequent setup involves a single Terraform codebase deployed across multiple environments using different workspaces and variable files. Each environment can be a separate Atlantis project with its own workflow and apply stipulations.

atlantis.yaml for Multi-Environment:

version: 3
projects:
  - name: myapp-dev
    dir: terraform/myapp
    workspace: dev
    workflow: dev-workflow
    autoplan:
      when_modified: ["**/*.tf", "dev.tfvars", "../../modules/shared/**/*.tf"]
      enabled: true

  - name: myapp-staging
    dir: terraform/myapp
    workspace: staging
    workflow: staging-workflow
    apply_requirements: [approved]
    autoplan:
      when_modified: ["**/*.tf", "staging.tfvars", "../../modules/shared/**/*.tf"]
      enabled: true

  - name: myapp-prod
    dir: terraform/myapp
    workspace: prod
    workflow: prod-workflow # Possibly with manual approval steps
    apply_requirements: [approved, mergeable, undiverged]
    autoplan:
      when_modified: ["**/*.tf", "prod.tfvars", "../../modules/shared/**/*.tf"]
      enabled: true

workflows:
  dev-workflow:
    plan:
      steps:
        - init
        - plan: {extra_args: ["-var-file=../dev.tfvars", "-out=$PLANFILE"]}
    apply:
      steps: [apply]

  # staging-workflow and prod-workflow would be defined similarly,
  # potentially with more steps (e.g., security scans, notifications).
  prod-workflow:
    plan:
      steps:
        - init
        - run: ./scripts/tfsec_scan.sh .
        - plan: {extra_args: ["-var-file=../prod.tfvars", "-out=$PLANFILE"]}
    apply:
      steps:
        - run: ./scripts/prod_approval_gate.sh $PULL_NUM
        - apply
        - run: ./scripts/notify_slack.sh "PROD apply for $PROJECT_NAME complete."

While adaptable, defining and sustaining distinct yet similar workflows can result in boilerplate. Solutions presenting hierarchical configuration models, where environment-specific settings can be inherited and overridden more tidily, can simplify this. Scalr, for example, offers an environment hierarchy that can make managing variables and policies across dev, staging, and prod more straightforward without extensive YAML repetition.

Integrating Custom Scripts: Linters and Security Scanners (tfsec, Checkov)

run steps are fitting for embedding quality gates and security verifications. A non-zero exit code from a script will stop the workflow.

Example: tfsec Pre-Plan Scan

workflows:
  secure-workflow:
    plan:
      steps:
        - init
        - run:
            command: |
              echo "Running tfsec scan..."
              # tfsec will exit non-zero if issues are found, halting the workflow
              tfsec .
            description: "tfsec security scan"
        - plan: {extra_args: ["-out=$PLANFILE"]}
    apply:
      steps: [apply]

To employ tools like tfsec or Checkov, they must be present in the Atlantis execution environment (e.g., your Atlantis Docker image). Handling these dependencies and ensuring scripts are sound is important.

Conditional Logic: `when_modified` and PR Label Strategies

PR Label Logic (via Custom Scripts): Atlantis doesn't natively respond to PR labels. A run step, however, can run a script that polls your VCS (e.g., GitHub API) for labels and exits based on them, effectively gating the workflow. This necessitates secure stewardship of a VCS token with suitable permissions. Conceptual Script (check_label.sh):

#!/bin/bash
# Needs GITHUB_TOKEN, PULL_NUM, BASE_REPO_OWNER, BASE_REPO_NAME env vars
REQUIRED_LABEL="ready-for-deploy"
# ... (curl GitHub API to fetch labels for $PULL_NUM) ...
if [[ $LABELS == *"$REQUIRED_LABEL"* ]]; then
  echo "Label '$REQUIRED_LABEL' found. Proceeding."
  exit 0
else
  echo "Label '$REQUIRED_LABEL' not found. Halting."
  exit 1
fi

Workflow Integration:

workflows:
  label-gated-workflow:
    plan:
      steps:
        - run: ./scripts/check_label.sh # Gate based on label
        - init
        - plan: {extra_args: ["-out=$PLANFILE"]}

This method, while potent, introduces considerable scripting difficulty and external dependencies. Platforms with integrated policy engines (like OPA support in Scalr) can furnish more robust and governable ways to enforce such conditional logic without custom scripting for API communications.

when_modified: The autoplan.when_modified key uses glob patterns to initiate plans only when pertinent files are altered. This is very useful in monorepos to sidestep needless computation.

autoplan:
  when_modified:
    - "**/*.tf"                     # Files in this project's dir
    - "../../modules/network/**/*.tf" # Files in a shared module
    - ".terraform.lock.hcl"
  enabled: true

4. Navigating the Labyrinth: Debugging Custom Workflows

Troubleshooting custom Atlantis workflows means:

Atlantis Server Logs: Activate debug logging (--log-level debug) for comprehensive output on parsing, project finding, and step running.
atlantis.yaml Syntax: Employ a YAML linter. Frequent problems include indentation (spaces, not tabs) and incorrect nesting.
Custom Script Failures: Examine exit codes. Scripts require correct permissions, and all dependencies (linters, CLIs) must be in the Atlantis environment. Test scripts in an environment that mirrors Atlantis.
when_modified Issues: Confirm glob patterns are accurate and relative paths to shared modules are correct.
Server-Side Overrides: Settings in atlantis.yaml might be disregarded if not allowed by allowed_overrides in the server-side repos.yaml.

The feedback cycle for debugging involved run steps inside Atlantis can occasionally be protracted. An environment that offers superior visibility into execution steps or permits easier testing of automation scripts can be advantageous.

5. Taming Complexity: Best Practices for `atlantis.yaml`

As configurations expand, maintainability becomes a chief concern.

Monorepo Management Strategies

Granular Projects: Define each independently deployable segment.
Optimized when_modified: Very important for averting plan storms. Think about server-side --autoplan-modules for automatic dependency tracking, but be cognizant of its actions.
execution_order_group: Direct plan/apply sequence for dependencies in global runs.
Parallelism: Use parallel_plan: true and parallel_apply: true to accelerate operations.

The sheer quantity of project definitions and convoluted when_modified paths in a substantial monorepo can render the atlantis.yaml file difficult to manage. Some teams opt to generate this file, adding another layer of instrumentation.

YAML Anchors for Readability

YAML anchors (&) and aliases (*) can diminish repetition for common step sequences or apply_requirements.

_common_plan_steps: &common_plan_steps
  - init
  - run: ./scripts/lint.sh
  - plan: {extra_args: ["-out=$PLANFILE"]}

workflows:
  my-workflow:
    plan:
      steps:
        - *common_plan_steps # Reuse anchored steps

While beneficial, excessive use of anchors can sometimes cloud the final configuration, making it tougher to follow a project's precise behavior.

Server-Side Governance with `repos.yaml`

For larger configurations, the server-side repos.yaml (via --repo-config) is essential for centralized command.

allowed_overrides: Specifies which atlantis.yaml keys can be set at the repo level.
allowed_workflows: Constrains which server-defined workflows can be employed.
allow_custom_workflows: true/false: A significant security setting. If true, repos can define arbitrary run steps. Default to false and manage workflows centrally unless you have very strong trust and review protocols.

Effectively, repos.yaml lets a platform team provide a "paved road" for Terraform automation. Managing this central configuration and the dynamic between server-side and repo-side settings, however, demands careful forethought. This is another domain where a platform like Scalr, with its built-in role-based access control (RBAC) and policy enforcement (e.g., via OPA), can deliver a more integrated governance model without depending on multiple strata of YAML configuration.

6. Summary: Key Atlantis Workflow Capabilities

Feature/Component	Advanced Capability	Key Configuration	Considerations/Complexity
Projects	Granular control per environment/component	`dir`, `workspace`, `name`	Can lead to verbose `atlantis.yaml` in monorepos.
Workflows	Custom plan/apply stages and steps	`workflows.<name>.plan.steps`, `apply.steps`	Managing script dependencies, permissions, and execution environment.
`run` Steps	Integrate any CLI tool (linters, scanners, notifiers)	`run: <command>`	Tooling must be in Atlantis image; script robustness; exit code handling.
`apply_requirements`	Enforce PR approvals, mergeability	`apply_requirements: [approved]`	Relies on VCS integration and PR states.
`autoplan.when_modified`	Conditional planning based on file changes	`when_modified: ["glob/*/pattern"]`	Crafting accurate globs, especially for shared modules; can be complex in large monorepos.
PR Label Logic	Gate workflows based on PR labels (via custom script)	`run: ./check_label.sh`	Requires scripting, VCS API interaction, and secure token management.
YAML Anchors	Reduce duplication in `atlantis.yaml`	`&anchor_name`, `*alias_name`	Can reduce readability if overused or deeply nested.
`repos.yaml`	Centralized server-side governance	`allowed_overrides`, `allow_custom_workflows`	Requires careful policy planning; managing interaction between server and repo configs.

7. Conclusion: Scaling Your Terraform Automation

Atlantis provides a potent, open-source base for Terraform automation within the PR workflow. Its custom workflow features allow teams to shape automation pipelines to advanced needs, incorporating everything from security checks to multi-environment deployment plans.

Nevertheless, as this document shows, using this advanced functionality brings substantial configuration management, scripting, and operational duties. Sustaining complex atlantis.yaml files, managing the execution setting for custom scripts, debugging elaborate workflows, and ensuring consistent governance across numerous repositories or a large monorepo can become major undertakings. This is no small feet.

For organizations finding themselves investing considerable effort in building and maintaining these advanced Atlantis setups, or those seeking more built-in governance, security, and scalability attributes from the start, it might be wise to assess managed Terraform automation platforms. Solutions like Scalr aim to tackle these issues by offering a more structured method with features like hierarchical environment management, integrated OPA policy enforcement, RBAC, and a focus on the end-to-end Terraform lifecycle, potentially lessening the need for extensive custom YAML and scripting.

The correct path hinges on your team's scale, expertise, and readiness to manage the underlying automation infrastructure versus utilizing a more opinionated platform.