Ultimate Guide to Custom Atlantis Workflows

Extend Atlantis with custom workflows to automate complex Terraform tasks, enforce guardrails, and deliver secure IaC faster.

Atlantis. It's a recognized open-source solution for Terraform automation via pull requests, enabling GitOps for infrastructure provisioning. Its standard capabilities provide a foundational layer. However, unlocking true operational leverage—and navigating potential complexities—hinges on mastering atlantis.yaml. This document examines advanced configurations: multi-environment deployments, integration of security and compliance tooling, and strategies for mitigating the operational overhead inherent in scaling such YAML-defined automation. As infrastructure footprints expand, proficiency with these configurations becomes critical to maintaining velocity and control. It's a journey many engineering teams undertake, and one that often leads to evaluating more comprehensive platforms.

1. Beyond Baseline Automation: The Atlantis Proposition

Terraform has become a cornerstone of modern Infrastructure as Code strategy. Consequently, solutions that streamline its operationalization are highly valued. Atlantis offers a compelling entry point, integrating Terraform execution directly within Git-centric workflows. This facilitates collaborative PR-driven plan and apply cycles, aligning with GitOps principles and enhancing team productivity and auditability. Standard operational benefits.

Yet, infrastructure rarely remains static in complexity or scale. As organizations mature, the demand for more sophisticated automation escalates. This is the inflection point where atlantis.yaml becomes central, offering the means to construct tailored automation sequences. We're talking sophisticated deployment pipelines, integration with third-party validation tools, and enforcement of organizational policies. The potential is significant. So is the potential for increased management overhead. For organizations confronting the limitations of YAML-centric automation at scale, or those seeking enterprise-grade governance from day one, it becomes prudent to assess platforms like Scalr. These are engineered to provide robust CI/CD for Terraform, often with less direct YAML wrangling and more built-in control mechanisms. A strategic consideration for growing teams.

2. Dissecting atlantis.yaml: Configuration Architecture

The atlantis.yaml file, residing at the repository root, is the control plane for Atlantis customization. It dictates project discovery, command execution, and conditional logic. Understanding its architecture is key.

Core Elements: Structure & Key Parameters

All atlantis.yaml configurations adhere to version: 3. Key top-level parameters include:

  • projects: An array defining specific Terraform projects. Once specified, Atlantis relies exclusively on these definitions, ceasing autodiscovery. This offers precision.
  • workflows: A map defining custom sequences for plan and apply operations.
  • automerge: Boolean. Enables automatic PR merging post-successful apply. Use with appropriate risk assessment.
  • parallel_plan/parallel_apply: Booleans. Facilitate concurrent execution for multiple projects, potentially accelerating throughput.

Project Definitions: Achieving Granular Control

Each entry within the projects array delineates a specific Terraform module or configuration set that Atlantis will manage.

  • dir: The directory path to the project's Terraform files.
  • workspace: Specifies the Terraform workspace. Defaults to default.
  • name: A unique identifier, essential when multiple project definitions target the same dir with different workspaces. Critical for targeted operations.
  • workflow: Assigns a defined custom workflow to the project.
  • autoplan: Configures automatic planning, notably leveraging when_modified criteria for targeted execution.
  • apply_requirements: Stipulates preconditions for atlantis apply execution, such as approved or mergeable.
version: 3
projects:
  - name: myapp-production-deploy
    dir: terraform/application-stack
    workspace: production
    workflow: secure-prod-workflow
    autoplan:
      when_modified:
        - "**/*.tf"
        - "production.tfvars"
      enabled: true
    apply_requirements: [approved, mergeable, undiverged] # Standard for production gates

The undiverged requirement ensures the PR branch is synchronized with the target branch, a common best practice for production changes.

Custom Workflows: Orchestrating Plan, Apply, and Governance Steps

Custom workflows enable organizations to supersede default plan and apply behaviors with bespoke, multi-step processes.

  • Stages: Typically plan and apply, forming the core of the deployment lifecycle.
  • Steps: Can be built-in Atlantis commands (init, plan, apply, show) or, critically, custom run commands for arbitrary script execution.

The run step is the gateway to extensibility. It executes shell scripts, providing access to contextual environment variables like $PLANFILE, $WORKSPACE, $PROJECT_NAME, and $PULL_NUM. This is powerful. It also means the organization assumes responsibility for script maintenance, security, and the underlying execution environment.

workflows:
  secure-prod-workflow:
    plan:
      steps:
        - init
        - run: echo "Executing pre-production validation scripts..."
        - plan:
            extra_args: ["-var-file=production.tfvars", "-out=$PLANFILE"] # $PLANFILE is non-negotiable for plan output
    apply:
      steps:
        - run: echo "Initiating final approval sequence for $PROJECT_NAME..."
        # Script to integrate with external change management or approval systems
        - run: ./scripts/gate_production_release.sh $PULL_NUM $PROJECT_NAME
        - apply
        - run: echo "Production deployment for $PROJECT_NAME completed. Post-deployment verifications initiated."

3. Strategic Implementation: Advanced Use Case Scenarios

How do these configurations translate into tangible operational improvements for complex scenarios?

Multi-Environment Deployment Orchestration (Dev, Staging, Production)

A prevalent requirement: managing a single Terraform codebase across multiple deployment environments, each with distinct configurations and governance. Atlantis addresses this through discrete project definitions.

atlantis.yaml for Multi-Environment Pipelines:

version: 3
projects:
  - name: myapp-dev-deploy
    dir: terraform/application-stack
    workspace: development
    workflow: dev-ci-workflow # A more streamlined workflow for dev
    autoplan:
      when_modified: ["**/*.tf", "development.tfvars", "../../modules/core-infra/**/*.tf"] # Include shared module paths
      enabled: true

  - name: myapp-staging-deploy
    dir: terraform/application-stack
    workspace: staging
    workflow: staging-qa-workflow
    apply_requirements: [approved] # Introduce approval gates for staging
    autoplan:
      when_modified: ["**/*.tf", "staging.tfvars", "../../modules/core-infra/**/*.tf"]
      enabled: true

  - name: myapp-production-deploy # Name updated from previous example for consistency
    dir: terraform/application-stack
    workspace: production
    workflow: secure-prod-workflow # Enforce rigorous production controls
    apply_requirements: [approved, mergeable, undiverged]
    autoplan:
      when_modified: ["**/*.tf", "production.tfvars", "../../modules/core-infra/**/*.tf"]
      enabled: true

workflows:
  dev-ci-workflow:
    plan:
      steps:
        - init
        - plan: {extra_args: ["-var-file=../development.tfvars", "-out=$PLANFILE"]}
    apply:
      steps: [apply] # Potentially faster apply cycle for dev

  # staging-qa-workflow would likely include integration tests or more comprehensive validation
  secure-prod-workflow: # Defined earlier, includes security scans and approval gates
    plan:
      steps:
        - init
        - run: ./scripts/security_compliance_scan.sh . # Mandate security scanning
        - plan: {extra_args: ["-var-file=../production.tfvars", "-out=$PLANFILE"]}
    apply:
      steps:
        - run: ./scripts/gate_production_release.sh $PULL_NUM $PROJECT_NAME # Formal release gate
        - apply
        - run: ./scripts/audit_notification_service.sh "PROD apply: $PROJECT_NAME by $USER_NAME. PR: $PULL_NUM."

This configuration offers considerable adaptability. However, observe the potential for workflow repetition. If workflows for different environments share substantial commonality, managing them as distinct entities can increase maintenance. This is a typical scaling challenge where platforms like Scalr, with features for hierarchical environment configuration and policy inheritance, can offer a more streamlined approach, reducing YAML duplication and improving manageability of environment-specific overrides.

Integrating Custom Tooling: Linters, Security Scanners, Compliance Checks

run steps are instrumental for embedding quality, security, and compliance gates within the automation pipeline. A non-zero exit code from any script halts the workflow.

Example: Pre-plan Security Scan with tfsec

workflows:
  security-focused-workflow:
    plan:
      steps:
        - init
        - run:
            command: |
              echo "Initiating tfsec security vulnerability scan..."
              # tfsec's non-zero exit on findings will halt the workflow.
              tfsec . --config-file ./tfsec-config.yml
            description: "tfsec Security Vulnerability Scan"
        - plan: {extra_args: ["-out=$PLANFILE"]}
    apply:
      steps: [apply] # Proceeds only if plan and preceding scans are successful.

Effective integration requires that tools like tfsec or Checkov are available within the Atlantis execution environment (e.g., the specified Docker image). Managing these dependencies and ensuring script robustness are key operational responsibilities.

Conditional Execution: when_modified and PR Label-Driven Strategies

PR Label-Driven Logic (via Custom Scripting): Atlantis lacks native support for PR label-based conditions. This functionality can be engineered using a run step that executes a script to query the VCS API (e.g., GitHub/GitLab) for PR labels and then dictates workflow progression via its exit code. Conceptual Script (validate_pr_label.sh):

#!/bin/bash
# Leverages GITHUB_TOKEN, PULL_NUM, BASE_REPO_OWNER, BASE_REPO_NAME from Atlantis
REQUIRED_DEPLOY_LABEL="ready-for-production"
# ... (curl to GitHub API to fetch labels for $PULL_NUM) ...
if [[ $API_RESPONSE_LABELS == *"$REQUIRED_DEPLOY_LABEL"* ]]; then
  echo "Required label '$REQUIRED_DEPLOY_LABEL' present. Workflow proceeds."
  exit 0 # Success
else
  echo "Label '$REQUIRED_DEPLOY_LABEL' absent. Halting workflow."
  exit 1 # Failure
fi

Integration in atlantis.yaml:

workflows:
  production-release-workflow:
    plan:
      steps:
        - run: ./scripts/validate_pr_label.sh # Label validation gate
        - init
        - plan: {extra_args: ["-out=$PLANFILE"]}

This approach provides flexibility but introduces dependencies on custom scripts, API interactions, and secure token management. For organizations requiring sophisticated policy enforcement based on PR metadata, platforms like Scalr with integrated OPA (Open Policy Agent) capabilities can offer a more robust and maintainable solution, abstracting away the need for custom API scripting for such conditional logic. This is particularly relevant when governance policies become complex.

when_modified: Atlantis's native mechanism for triggering plans based on file changes using glob patterns. Essential for optimizing monorepo performance by avoiding unnecessary plans.

autoplan:
  when_modified:
    - "**/*.tf"                                 # Project-specific TF files
    - "../../shared-modules/networking/**/*.tf"  # Changes in a shared networking module
    - ".terraform.lock.hcl"                    # Lock file changes
  enabled: true

4. Troubleshooting Complexities: Debugging Custom Workflow Implementations

Effective debugging is vital when custom Atlantis workflows exhibit unexpected behavior.

  • Atlantis Server Logs: Indispensable. Enable debug logging (--log-level debug) for maximum insight into parsing, project discovery, and step-by-step execution.
  • YAML Linting: Proactively validate atlantis.yaml syntax. Common issues include incorrect indentation (use spaces, not tabs) and misplaced colons. Many hours of engineering time can be saved with a linter.
  • Custom Script Failures: Examine script exit codes and output. Ensure scripts have execute permissions and all dependencies (linters, CLIs, SDKs) are present in the Atlantis execution environment. Test scripts in an environment that closely mirrors the Atlantis container.
  • when_modified Issues: Verify glob pattern accuracy and path relativity, especially for shared modules.
  • Server-Side Configuration Conflicts: Settings in atlantis.yaml can be overridden or disallowed by the server-side repos.yaml. If a configuration doesn't seem to apply, check allowed_overrides in repos.yaml.

The iterative cycle of modifying scripts, committing, pushing, and observing Atlantis behavior in a PR can be time-consuming. An environment that offers enhanced visibility or streamlined testing for automation scripts can significantly improve developer productivity.

5. Achieving Scalability & Maintainability with atlantis.yaml

As atlantis.yaml configurations grow, particularly in monorepos, proactive strategies for maintainability are essential.

Monorepo Configuration Strategies

  • Granular Project Definitions: Define each independently deployable service or component as a distinct project.
  • Optimized when_modified Paths: Critical for preventing "plan storms" where numerous unrelated projects are planned. Consider server-side options like --autoplan-modules for automated dependency tracking, but thoroughly vet its behavior in your specific context.
  • execution_order_group: Manage inter-project dependencies by defining the plan/apply sequence for global commands.
  • Parallel Execution: Utilize parallel_plan: true and parallel_apply: true to enhance performance for repositories with many independent projects.

The sheer volume of project definitions and when_modified paths in large-scale monorepos can render atlantis.yaml unwieldy. Some organizations resort to programmatic generation of this file, introducing an additional layer of tooling and management. This is a common indicator that the limits of direct YAML management are being approached.

YAML Anchors for Configuration Efficiency

YAML anchors (&) and aliases (*) can mitigate duplication for common configuration blocks, such as step sequences or apply_requirements.

_standard_plan_sequence: &standard_plan_sequence
  - init
  - run: ./scripts/code_linting.sh
  - plan: {extra_args: ["-out=$PLANFILE"]}

workflows:
  application-workflow:
    plan:
      steps:
        - *standard_plan_sequence # Reuse the anchored sequence

While beneficial for DRY (Don't Repeat Yourself) principles, excessive use of anchors can sometimes obscure the effective configuration, making it harder to trace a project's exact operational parameters.

Centralized Governance via Server-Side repos.yaml

For organizations standardizing Atlantis usage, the server-side repos.yaml (specified via --repo-config) is fundamental for centralized policy enforcement and governance.

  • allowed_overrides: Defines which atlantis.yaml keys can be customized at the repository level.
  • allowed_workflows: Restricts repositories to a pre-approved list of server-defined workflows.
  • allow_custom_workflows: true/false: A critical security control. Setting to false (default) prevents repositories from defining arbitrary run steps, which could execute malicious code.

Effectively, repos.yaml enables a platform engineering team to establish a "paved road" for Terraform automation. However, managing this central configuration and its interaction with repository-level settings requires careful planning. This is another area where enterprise platforms like Scalr provide more integrated governance models, often incorporating RBAC and policy frameworks like OPA, without relying on multiple layers of YAML configuration for policy definition and enforcement.

6. Capability Summary: Key Atlantis Workflow Functionalities

Feature/Component

Business Value / Capability

Key Configuration Parameters

Operational Considerations / Potential Challenges

Projects

Granular control per environment or component

dir, workspace, name

Can lead to verbose atlantis.yaml in large monorepos; requires diligent management.

Workflows

Customization of plan/apply lifecycles with bespoke stages/steps

workflows.<name>.plan.steps, apply.steps

Management of script dependencies, permissions, and execution environment; script robustness.

run Steps

Integration of third-party tools (security, linting, notification)

run: <command>

Tooling must be available in Atlantis image; script error handling; exit code management.

apply_requirements

Enforcement of PR approval gates and mergeability criteria

apply_requirements: [approved]

Dependency on VCS integration and PR states; alignment with change management processes.

autoplan.when_modified

Conditional planning based on file modifications, optimizing CI

when_modified: ["glob/**/pattern*"]

Crafting accurate glob patterns, especially for shared modules; potential for misconfiguration.

PR Label Logic (Scripted)

Workflow gating based on PR metadata (e.g., labels)

run: ./script_to_check_labels.sh

Requires custom scripting, VCS API interaction, secure token management; maintenance overhead.

YAML Anchors

Reduction of configuration duplication in atlantis.yaml

&anchor_name, *alias_name

Can reduce immediate readability if overused or deeply nested; careful application needed.

repos.yaml

Centralized server-side governance and policy enforcement

allowed_overrides, allow_custom_workflows

Requires strategic policy planning; managing interplay between server and repository configs.

7. Strategic Outlook: Scaling Your Terraform Automation Maturity

Atlantis provides a robust, open-source foundation for integrating Terraform automation within PR-centric workflows. Its capabilities for custom workflow definition allow organizations to tailor automation pipelines to sophisticated operational requirements, incorporating everything from automated security validation to multi-environment deployment strategies with complex approval gates.

However, leveraging these advanced functionalities introduces non-trivial responsibilities in configuration management, script development, and operational oversight. Maintaining intricate atlantis.yaml files, managing the execution environment and dependencies for custom scripts, debugging multifaceted workflows, and ensuring consistent governance across a growing number of repositories or within a large-scale monorepo can become significant operational burdens. This is not a small feet for any team.

For organizations experiencing friction in scaling their Atlantis deployments, or those proactively seeking more comprehensive, enterprise-grade features for governance, security posture management, and operational scalability, an evaluation of managed Terraform automation platforms is a logical next step. Solutions such as Scalr are specifically designed to address these enterprise challenges, offering features like hierarchical environment management, integrated OPA policy enforcement, fine-grained RBAC, and a holistic view of the Terraform operational lifecycle. Such platforms can potentially abstract away much of the custom YAML and scripting complexity, allowing teams to focus on delivering infrastructure value rather than managing the automation tooling itself.

The optimal path forward depends on an organization's specific scale, in-house expertise, strategic priorities regarding build-versus-buy for tooling, and the operational capacity to manage increasingly sophisticated open-source automation infrastructure. Making an informed decision is key to long-term success. I trust this overview provides a solid foundation for your strategic planning.