Ultimate Guide to Custom Atlantis Workflows
Extend Atlantis with custom workflows to automate complex Terraform tasks, enforce guardrails, and deliver secure IaC faster.
Atlantis. It's a recognized open-source solution for Terraform automation via pull requests, enabling GitOps for infrastructure provisioning. Its standard capabilities provide a foundational layer. However, unlocking true operational leverage—and navigating potential complexities—hinges on mastering atlantis.yaml
. This document examines advanced configurations: multi-environment deployments, integration of security and compliance tooling, and strategies for mitigating the operational overhead inherent in scaling such YAML-defined automation. As infrastructure footprints expand, proficiency with these configurations becomes critical to maintaining velocity and control. It's a journey many engineering teams undertake, and one that often leads to evaluating more comprehensive platforms.
1. Beyond Baseline Automation: The Atlantis Proposition
Terraform has become a cornerstone of modern Infrastructure as Code strategy. Consequently, solutions that streamline its operationalization are highly valued. Atlantis offers a compelling entry point, integrating Terraform execution directly within Git-centric workflows. This facilitates collaborative PR-driven plan
and apply
cycles, aligning with GitOps principles and enhancing team productivity and auditability. Standard operational benefits.
Yet, infrastructure rarely remains static in complexity or scale. As organizations mature, the demand for more sophisticated automation escalates. This is the inflection point where atlantis.yaml
becomes central, offering the means to construct tailored automation sequences. We're talking sophisticated deployment pipelines, integration with third-party validation tools, and enforcement of organizational policies. The potential is significant. So is the potential for increased management overhead. For organizations confronting the limitations of YAML-centric automation at scale, or those seeking enterprise-grade governance from day one, it becomes prudent to assess platforms like Scalr. These are engineered to provide robust CI/CD for Terraform, often with less direct YAML wrangling and more built-in control mechanisms. A strategic consideration for growing teams.
2. Dissecting atlantis.yaml
: Configuration Architecture
The atlantis.yaml
file, residing at the repository root, is the control plane for Atlantis customization. It dictates project discovery, command execution, and conditional logic. Understanding its architecture is key.
Core Elements: Structure & Key Parameters
All atlantis.yaml
configurations adhere to version: 3
. Key top-level parameters include:
projects
: An array defining specific Terraform projects. Once specified, Atlantis relies exclusively on these definitions, ceasing autodiscovery. This offers precision.workflows
: A map defining custom sequences for plan and apply operations.automerge
: Boolean. Enables automatic PR merging post-successful apply. Use with appropriate risk assessment.parallel_plan
/parallel_apply
: Booleans. Facilitate concurrent execution for multiple projects, potentially accelerating throughput.
Project Definitions: Achieving Granular Control
Each entry within the projects
array delineates a specific Terraform module or configuration set that Atlantis will manage.
dir
: The directory path to the project's Terraform files.workspace
: Specifies the Terraform workspace. Defaults todefault
.name
: A unique identifier, essential when multiple project definitions target the samedir
with different workspaces. Critical for targeted operations.workflow
: Assigns a defined custom workflow to the project.autoplan
: Configures automatic planning, notably leveragingwhen_modified
criteria for targeted execution.apply_requirements
: Stipulates preconditions foratlantis apply
execution, such asapproved
ormergeable
.
version: 3
projects:
- name: myapp-production-deploy
dir: terraform/application-stack
workspace: production
workflow: secure-prod-workflow
autoplan:
when_modified:
- "**/*.tf"
- "production.tfvars"
enabled: true
apply_requirements: [approved, mergeable, undiverged] # Standard for production gates
The undiverged
requirement ensures the PR branch is synchronized with the target branch, a common best practice for production changes.
Custom Workflows: Orchestrating Plan, Apply, and Governance Steps
Custom workflows enable organizations to supersede default plan
and apply
behaviors with bespoke, multi-step processes.
- Stages: Typically
plan
andapply
, forming the core of the deployment lifecycle. - Steps: Can be built-in Atlantis commands (
init
,plan
,apply
,show
) or, critically, customrun
commands for arbitrary script execution.
The run
step is the gateway to extensibility. It executes shell scripts, providing access to contextual environment variables like $PLANFILE
, $WORKSPACE
, $PROJECT_NAME
, and $PULL_NUM
. This is powerful. It also means the organization assumes responsibility for script maintenance, security, and the underlying execution environment.
workflows:
secure-prod-workflow:
plan:
steps:
- init
- run: echo "Executing pre-production validation scripts..."
- plan:
extra_args: ["-var-file=production.tfvars", "-out=$PLANFILE"] # $PLANFILE is non-negotiable for plan output
apply:
steps:
- run: echo "Initiating final approval sequence for $PROJECT_NAME..."
# Script to integrate with external change management or approval systems
- run: ./scripts/gate_production_release.sh $PULL_NUM $PROJECT_NAME
- apply
- run: echo "Production deployment for $PROJECT_NAME completed. Post-deployment verifications initiated."
3. Strategic Implementation: Advanced Use Case Scenarios
How do these configurations translate into tangible operational improvements for complex scenarios?
Multi-Environment Deployment Orchestration (Dev, Staging, Production)
A prevalent requirement: managing a single Terraform codebase across multiple deployment environments, each with distinct configurations and governance. Atlantis addresses this through discrete project
definitions.
atlantis.yaml
for Multi-Environment Pipelines:
version: 3
projects:
- name: myapp-dev-deploy
dir: terraform/application-stack
workspace: development
workflow: dev-ci-workflow # A more streamlined workflow for dev
autoplan:
when_modified: ["**/*.tf", "development.tfvars", "../../modules/core-infra/**/*.tf"] # Include shared module paths
enabled: true
- name: myapp-staging-deploy
dir: terraform/application-stack
workspace: staging
workflow: staging-qa-workflow
apply_requirements: [approved] # Introduce approval gates for staging
autoplan:
when_modified: ["**/*.tf", "staging.tfvars", "../../modules/core-infra/**/*.tf"]
enabled: true
- name: myapp-production-deploy # Name updated from previous example for consistency
dir: terraform/application-stack
workspace: production
workflow: secure-prod-workflow # Enforce rigorous production controls
apply_requirements: [approved, mergeable, undiverged]
autoplan:
when_modified: ["**/*.tf", "production.tfvars", "../../modules/core-infra/**/*.tf"]
enabled: true
workflows:
dev-ci-workflow:
plan:
steps:
- init
- plan: {extra_args: ["-var-file=../development.tfvars", "-out=$PLANFILE"]}
apply:
steps: [apply] # Potentially faster apply cycle for dev
# staging-qa-workflow would likely include integration tests or more comprehensive validation
secure-prod-workflow: # Defined earlier, includes security scans and approval gates
plan:
steps:
- init
- run: ./scripts/security_compliance_scan.sh . # Mandate security scanning
- plan: {extra_args: ["-var-file=../production.tfvars", "-out=$PLANFILE"]}
apply:
steps:
- run: ./scripts/gate_production_release.sh $PULL_NUM $PROJECT_NAME # Formal release gate
- apply
- run: ./scripts/audit_notification_service.sh "PROD apply: $PROJECT_NAME by $USER_NAME. PR: $PULL_NUM."
This configuration offers considerable adaptability. However, observe the potential for workflow repetition. If workflows for different environments share substantial commonality, managing them as distinct entities can increase maintenance. This is a typical scaling challenge where platforms like Scalr, with features for hierarchical environment configuration and policy inheritance, can offer a more streamlined approach, reducing YAML duplication and improving manageability of environment-specific overrides.
Integrating Custom Tooling: Linters, Security Scanners, Compliance Checks
run
steps are instrumental for embedding quality, security, and compliance gates within the automation pipeline. A non-zero exit code from any script halts the workflow.
Example: Pre-plan
Security Scan with tfsec
workflows:
security-focused-workflow:
plan:
steps:
- init
- run:
command: |
echo "Initiating tfsec security vulnerability scan..."
# tfsec's non-zero exit on findings will halt the workflow.
tfsec . --config-file ./tfsec-config.yml
description: "tfsec Security Vulnerability Scan"
- plan: {extra_args: ["-out=$PLANFILE"]}
apply:
steps: [apply] # Proceeds only if plan and preceding scans are successful.
Effective integration requires that tools like tfsec
or Checkov
are available within the Atlantis execution environment (e.g., the specified Docker image). Managing these dependencies and ensuring script robustness are key operational responsibilities.
Conditional Execution: when_modified
and PR Label-Driven Strategies
PR Label-Driven Logic (via Custom Scripting): Atlantis lacks native support for PR label-based conditions. This functionality can be engineered using a run
step that executes a script to query the VCS API (e.g., GitHub/GitLab) for PR labels and then dictates workflow progression via its exit code. Conceptual Script (validate_pr_label.sh
):
#!/bin/bash
# Leverages GITHUB_TOKEN, PULL_NUM, BASE_REPO_OWNER, BASE_REPO_NAME from Atlantis
REQUIRED_DEPLOY_LABEL="ready-for-production"
# ... (curl to GitHub API to fetch labels for $PULL_NUM) ...
if [[ $API_RESPONSE_LABELS == *"$REQUIRED_DEPLOY_LABEL"* ]]; then
echo "Required label '$REQUIRED_DEPLOY_LABEL' present. Workflow proceeds."
exit 0 # Success
else
echo "Label '$REQUIRED_DEPLOY_LABEL' absent. Halting workflow."
exit 1 # Failure
fi
Integration in atlantis.yaml
:
workflows:
production-release-workflow:
plan:
steps:
- run: ./scripts/validate_pr_label.sh # Label validation gate
- init
- plan: {extra_args: ["-out=$PLANFILE"]}
This approach provides flexibility but introduces dependencies on custom scripts, API interactions, and secure token management. For organizations requiring sophisticated policy enforcement based on PR metadata, platforms like Scalr with integrated OPA (Open Policy Agent) capabilities can offer a more robust and maintainable solution, abstracting away the need for custom API scripting for such conditional logic. This is particularly relevant when governance policies become complex.
when_modified
: Atlantis's native mechanism for triggering plans based on file changes using glob patterns. Essential for optimizing monorepo performance by avoiding unnecessary plans.
autoplan:
when_modified:
- "**/*.tf" # Project-specific TF files
- "../../shared-modules/networking/**/*.tf" # Changes in a shared networking module
- ".terraform.lock.hcl" # Lock file changes
enabled: true
4. Troubleshooting Complexities: Debugging Custom Workflow Implementations
Effective debugging is vital when custom Atlantis workflows exhibit unexpected behavior.
- Atlantis Server Logs: Indispensable. Enable debug logging (
--log-level debug
) for maximum insight into parsing, project discovery, and step-by-step execution. - YAML Linting: Proactively validate
atlantis.yaml
syntax. Common issues include incorrect indentation (use spaces, not tabs) and misplaced colons. Many hours of engineering time can be saved with a linter. - Custom Script Failures: Examine script exit codes and output. Ensure scripts have execute permissions and all dependencies (linters, CLIs, SDKs) are present in the Atlantis execution environment. Test scripts in an environment that closely mirrors the Atlantis container.
when_modified
Issues: Verify glob pattern accuracy and path relativity, especially for shared modules.- Server-Side Configuration Conflicts: Settings in
atlantis.yaml
can be overridden or disallowed by the server-siderepos.yaml
. If a configuration doesn't seem to apply, checkallowed_overrides
inrepos.yaml
.
The iterative cycle of modifying scripts, committing, pushing, and observing Atlantis behavior in a PR can be time-consuming. An environment that offers enhanced visibility or streamlined testing for automation scripts can significantly improve developer productivity.
5. Achieving Scalability & Maintainability with atlantis.yaml
As atlantis.yaml
configurations grow, particularly in monorepos, proactive strategies for maintainability are essential.
Monorepo Configuration Strategies
- Granular Project Definitions: Define each independently deployable service or component as a distinct project.
- Optimized
when_modified
Paths: Critical for preventing "plan storms" where numerous unrelated projects are planned. Consider server-side options like--autoplan-modules
for automated dependency tracking, but thoroughly vet its behavior in your specific context. execution_order_group
: Manage inter-project dependencies by defining the plan/apply sequence for global commands.- Parallel Execution: Utilize
parallel_plan: true
andparallel_apply: true
to enhance performance for repositories with many independent projects.
The sheer volume of project definitions and when_modified
paths in large-scale monorepos can render atlantis.yaml
unwieldy. Some organizations resort to programmatic generation of this file, introducing an additional layer of tooling and management. This is a common indicator that the limits of direct YAML management are being approached.
YAML Anchors for Configuration Efficiency
YAML anchors (&
) and aliases (*
) can mitigate duplication for common configuration blocks, such as step sequences or apply_requirements
.
_standard_plan_sequence: &standard_plan_sequence
- init
- run: ./scripts/code_linting.sh
- plan: {extra_args: ["-out=$PLANFILE"]}
workflows:
application-workflow:
plan:
steps:
- *standard_plan_sequence # Reuse the anchored sequence
While beneficial for DRY (Don't Repeat Yourself) principles, excessive use of anchors can sometimes obscure the effective configuration, making it harder to trace a project's exact operational parameters.
Centralized Governance via Server-Side repos.yaml
For organizations standardizing Atlantis usage, the server-side repos.yaml
(specified via --repo-config
) is fundamental for centralized policy enforcement and governance.
allowed_overrides
: Defines whichatlantis.yaml
keys can be customized at the repository level.allowed_workflows
: Restricts repositories to a pre-approved list of server-defined workflows.allow_custom_workflows: true/false
: A critical security control. Setting tofalse
(default) prevents repositories from defining arbitraryrun
steps, which could execute malicious code.
Effectively, repos.yaml
enables a platform engineering team to establish a "paved road" for Terraform automation. However, managing this central configuration and its interaction with repository-level settings requires careful planning. This is another area where enterprise platforms like Scalr provide more integrated governance models, often incorporating RBAC and policy frameworks like OPA, without relying on multiple layers of YAML configuration for policy definition and enforcement.
6. Capability Summary: Key Atlantis Workflow Functionalities
Feature/Component | Business Value / Capability | Key Configuration Parameters | Operational Considerations / Potential Challenges |
---|---|---|---|
Projects | Granular control per environment or component |
| Can lead to verbose |
Workflows | Customization of plan/apply lifecycles with bespoke stages/steps |
| Management of script dependencies, permissions, and execution environment; script robustness. |
| Integration of third-party tools (security, linting, notification) |
| Tooling must be available in Atlantis image; script error handling; exit code management. |
| Enforcement of PR approval gates and mergeability criteria |
| Dependency on VCS integration and PR states; alignment with change management processes. |
| Conditional planning based on file modifications, optimizing CI |
| Crafting accurate glob patterns, especially for shared modules; potential for misconfiguration. |
PR Label Logic (Scripted) | Workflow gating based on PR metadata (e.g., labels) |
| Requires custom scripting, VCS API interaction, secure token management; maintenance overhead. |
YAML Anchors | Reduction of configuration duplication in |
| Can reduce immediate readability if overused or deeply nested; careful application needed. |
| Centralized server-side governance and policy enforcement |
| Requires strategic policy planning; managing interplay between server and repository configs. |
7. Strategic Outlook: Scaling Your Terraform Automation Maturity
Atlantis provides a robust, open-source foundation for integrating Terraform automation within PR-centric workflows. Its capabilities for custom workflow definition allow organizations to tailor automation pipelines to sophisticated operational requirements, incorporating everything from automated security validation to multi-environment deployment strategies with complex approval gates.
However, leveraging these advanced functionalities introduces non-trivial responsibilities in configuration management, script development, and operational oversight. Maintaining intricate atlantis.yaml
files, managing the execution environment and dependencies for custom scripts, debugging multifaceted workflows, and ensuring consistent governance across a growing number of repositories or within a large-scale monorepo can become significant operational burdens. This is not a small feet for any team.
For organizations experiencing friction in scaling their Atlantis deployments, or those proactively seeking more comprehensive, enterprise-grade features for governance, security posture management, and operational scalability, an evaluation of managed Terraform automation platforms is a logical next step. Solutions such as Scalr are specifically designed to address these enterprise challenges, offering features like hierarchical environment management, integrated OPA policy enforcement, fine-grained RBAC, and a holistic view of the Terraform operational lifecycle. Such platforms can potentially abstract away much of the custom YAML and scripting complexity, allowing teams to focus on delivering infrastructure value rather than managing the automation tooling itself.
The optimal path forward depends on an organization's specific scale, in-house expertise, strategic priorities regarding build-versus-buy for tooling, and the operational capacity to manage increasingly sophisticated open-source automation infrastructure. Making an informed decision is key to long-term success. I trust this overview provides a solid foundation for your strategic planning.