Understanding Atlantis 101 Guide

Learn how Atlantis automates Terraform workflows—from pull-request plans to secure applies. Get setup steps, best practices, and tips in this 101 guide.

1. Introduction: The Drive for Terraform Automation

Terraform has become a cornerstone of modern infrastructure as code (IaC) practices, enabling teams to define and manage infrastructure with unparalleled consistency and version control. But as infrastructure complexity grows and team collaboration intensifies, the manual execution of Terraform commands can become a bottleneck, prone to errors and inconsistencies. This is where automation tools step in, promising to streamline workflows, enhance governance, and improve operational efficiency. One such popular open-source tool is Atlantis, designed to bring Terraform automation directly into your version control system's pull request (PR) process.

2. What is Atlantis? Core Functionality

Atlantis is an automation tool specifically built to integrate Terraform operations into pull request workflows. It acts as a server that listens for webhook events from your chosen Version Control System (VCS) – like GitHub, GitLab, or Bitbucket. When a developer opens or updates a PR with changes to Terraform code, Atlantis can automatically run terraform plan and post the output as a comment in the PR. This provides immediate visibility into the proposed changes. Once reviewed and approved, a simple PR comment like atlantis apply can trigger Atlantis to execute terraform apply, implementing the infrastructure modifications. Being self-hosted, Atlantis runs on infrastructure you provision and manage, offering a high degree of control over its environment.

3. Problems Atlantis Aims to Solve

Without a dedicated automation layer, Terraform collaboration can encounter several common pain points:

  • Decentralized Execution: Terraform run from individual machines can lead to environment drift and inconsistent results.
  • Manual Processes: Running plans, sharing outputs, and applying changes manually is time-consuming and error-prone.
  • Limited Visibility: Tracking who planned what, and what the expected outcome was, can be challenging.
  • State Management Conflicts: Concurrent operations without proper locking can jeopardize Terraform state integrity.
  • Onboarding Overhead: Requiring every contributor to have a fully configured local Terraform setup can be a barrier.

Atlantis seeks to address these by centralizing Terraform execution within a consistent, PR-driven workflow.

4. Key Benefits of an Atlantis-Driven Workflow

Integrating Atlantis can bring several advantages to a Terraform practice:

  • Enhanced Collaboration: Plan outputs directly in PRs facilitate focused discussions.
  • Increased Efficiency: Automation of plan and apply speeds up deployment cycles.
  • Improved Consistency: Centralized execution minimizes "works on my machine" issues.
  • Better Governance & Auditability: PRs provide a natural audit trail for all infrastructure changes.
  • Effective State Locking: Atlantis implements its own locking to prevent conflicting operations on the same project/workspace.
  • VCS Integration: Leverages familiar developer workflows without needing a separate UI.

5. The Self-Managed Journey: Setting Up Your Atlantis Server

Implementing Atlantis involves setting up and maintaining the server yourself. This provides flexibility but also means taking on the operational responsibility.

Prerequisites: Server, Git, Terraform, VCS

  • Server Infrastructure: You'll need a server (VM, container host) to run Atlantis. This requires careful consideration of OS (typically Linux), CPU, RAM (e.g., 1-2 vCPUs, 2-8GB RAM as a starting point), and disk space (5-50GB for Git clones, plan files). Network configuration is also key: the server must be accessible from your VCS for webhooks and be able to call out to the VCS API.
  • Git and Terraform: Git and the desired Terraform versions must be installed and accessible on the Atlantis server. Atlantis can manage Terraform binary downloads, which is a helpful feature.
  • Version Control System (VCS): A configured Git repository (e.g., on GitHub) is essential. Atlantis needs credentials (a Personal Access Token or, preferably, a GitHub App) to interact with your repositories.
  • Terraform State Backend: Atlantis mandates the use of a remote state backend (like S3, Azure Blob, GCS). local state is not supported due to Atlantis's operational model.

Deployment: Docker and Kubernetes Options

Atlantis is commonly deployed using Docker or Kubernetes.

  • Kubernetes: A Helm chart is available for Kubernetes deployments, often preferred for scalability and resilience. This involves managing Kubernetes manifests, secrets, services, and potentially ingresses.

Docker: The official ghcr.io/runatlantis/atlantis image simplifies deployment. A typical docker run command involves setting several environment variables for configuration:

docker run --name atlantis -d -p 4141:4141 \
  -e ATLANTIS_ATLANTIS_URL="<YOUR_ATLANTIS_PUBLIC_URL>" \
  -e ATLANTIS_GH_USER="<YOUR_GITHUB_USERNAME_OR_APP_NAME>" \
  -e ATLANTIS_GH_TOKEN="<YOUR_GITHUB_PAT_OR_APP_KEY_CONTENTS>" \
  -e ATLANTIS_GH_WEBHOOK_SECRET="<YOUR_WEBHOOK_SECRET>" \
  -e ATLANTIS_REPO_ALLOWLIST="github.com/your-org/*" \
  # Consider -v /path/to/atlantis-data:/atlantis-data for persistent plan storage
  ghcr.io/runatlantis/atlantis:latest server

Managing this container, its updates, and persistent data (if needed for plans to survive restarts) falls to the operations team.

VCS Integration: GitHub Authentication and Webhooks

Connecting Atlantis to your VCS (e.g., GitHub) is a critical step that requires careful attention to detail.

  • Authentication:
    • GitHub Personal Access Token (PAT): Simpler to generate but typically grants broad permissions (e.g., repo scope).
    • GitHub App (Recommended): Offers more granular permissions and enhanced security. Atlantis can even guide you through creating one via its /github-app/setup endpoint. This process involves defining specific repository permissions (Contents, Pull Requests, Commit Statuses, etc.) and handling an App ID and private key. Ensuring these permissions are correct and the private key is securely managed is paramount.
  • Webhook Configuration: GitHub (or your VCS) notifies Atlantis of PR events via webhooks. Manual configuration involves:
    1. Payload URL: Must be the public URL of your Atlantis server, crucially ending in /events (e.g., https://atlantis.yourdomain.com/events). A missing /events is a common setup error.
    2. Content Type: application/json.
    3. Secret: A shared secret to verify webhook authenticity, configured in both GitHub and Atlantis.
    4. Events: Subscribe to "Pull requests," "Issue comments," "Pushes," and "Pull request reviews." Correctly configuring and securing this communication channel is vital for Atlantis to function.

Cloud Credentials and Server Configuration

Atlantis itself doesn't handle cloud provider credentials directly. It relies on the execution environment of the terraform binary (i.e., the Atlantis server/container) having the necessary credentials.

  • Methods: IAM Roles (for cloud VMs/pods), environment variables (e.g., AWS_ACCESS_KEY_ID), or shared credential files are common. The security and management of these credentials on the Atlantis host are your responsibility.
  • Core Server Settings: Atlantis is configured via CLI flags or environment variables (e.g., ATLANTIS_ATLANTIS_URL, ATLANTIS_REPO_ALLOWLIST, ATLANTIS_DATA_DIR, ATLANTIS_DEFAULT_TF_VERSION).

The setup process, while well-documented, involves multiple components that need to be correctly configured and maintained by the user.

6. Defining Your Infrastructure: The atlantis.yaml File

To tell Atlantis how to handle Terraform projects within a repository, you use an atlantis.yaml file at the repo's root. Its key functions are:

  • Project Definition: Specifying directories (dir) and Terraform workspaces (workspace) for distinct projects.
  • Autoplan Control: Defining when terraform plan runs automatically (when_modified file patterns).
  • Terraform Versioning: Pinning projects to specific Terraform versions.
  • Workflow Customization: Optionally defining custom plan/apply steps (often restricted by server-side config for security).
  • Apply Requirements: Enforcing conditions like PR approval (apply_requirements: [approved]).

A simple atlantis.yaml might look like this:

version: 3
projects:
- name: my-app-staging
  dir: infra/staging
  workspace: staging
  autoplan:
    when_modified: ["**/*.tf", "**/*.tfvars", ".terraform.lock.hcl"]
    enabled: true
  terraform_version: v1.5.0
  # apply_requirements: [approved] # Server-side config may need to allow this

While atlantis.yaml offers project-level flexibility, managing these files across many repositories and coordinating them with server-side configurations (if used for central governance) adds another layer of configuration management.

7. The Pull Request Lifecycle with Atlantis

Once set up, the Atlantis workflow is quite intuitive:

  1. PR Creation/Update: A developer pushes Terraform changes and opens a PR.
  2. Automated terraform plan: Atlantis detects changes (based on atlantis.yaml) and runs terraform plan, posting the output as a PR comment.
  3. Review and Collaboration: The team reviews the plan directly in the PR. If changes are needed, new commits trigger a new plan.
  4. Executing terraform apply via Comments: An authorized user comments atlantis apply (optionally with flags like -p project-name or -d dir -w workspace) to apply the approved plan. Atlantis posts the apply output.
  5. State Locking: Atlantis locks projects during plan/apply to prevent concurrent operations, complementing Terraform's backend locking.

Essential PR Commands:

  • atlantis plan [-d dir] [-w workspace] [-p project_name] [-- <tf_flags>]: Manually trigger a plan.
  • atlantis apply [-d dir] [-w workspace] [-p project_name]: Apply a plan.
  • atlantis unlock: Release a stuck lock.
  • atlantis help: Show available commands.

This PR-centric flow is a significant strength, keeping infrastructure operations tied to version control and review processes.

8. Navigating Initial Hurdles: Common Atlantis Troubleshooting

As with any self-hosted system, initial setup can present challenges:

  • Webhook Issues: Incorrect Payload URL (especially the /events suffix), mismatched webhook secrets, or network connectivity blocking VCS calls to Atlantis. VCS webhook delivery logs are the first place to check.
  • Authentication/Permission Errors: Invalid or expired VCS tokens/GitHub App credentials, insufficient scopes/permissions for the token/App, or the target repository not being in Atlantis's --repo-allowlist.
  • Plan/Apply Failures: These are often Terraform-related (code errors, incorrect provider credentials on the Atlantis server, state lock issues) rather than Atlantis issues per se. Atlantis server logs (ideally at debug level) become crucial here.
  • atlantis.yaml Misconfigurations: YAML syntax errors, incorrect dir paths (they are relative to repo root), or when_modified patterns not matching files (patterns are relative to the project dir). Restricted features like custom workflows or apply_requirements might be disabled by server-side policy.

Troubleshooting often involves checking configurations across multiple systems: the VCS, the Atlantis server, network firewalls, and the Terraform code itself.

9. Atlantis at a Glance: Summary

Feature

Description

Key Consideration / Management Aspect

Core Function

Terraform PR Automation

Open-source, self-hosted

Workflow Trigger

VCS Webhooks (PR events, comments)

Requires careful webhook setup & public accessibility

terraform plan

Automated on PR, output as comment

atlantis.yaml controls behavior per project

terraform apply

Triggered by PR comment (atlantis apply)

Ensures intentional application after review

State Locking

PR-level project locking

Complements Terraform backend locking

Configuration

Server flags/env vars; repo-level atlantis.yaml

User manages all configuration layers

Deployment

Docker, Kubernetes, binary

User responsible for provisioning & server maintenance

VCS Authentication

PAT or GitHub App

Secure credential management is crucial

Cloud Credentials

Relies on server's environment (IAM roles, env vars)

User responsible for secure credential provisioning to host

Customization

Custom workflows, apply requirements (may be server-restricted)

Balances flexibility with potential complexity

Scalability

Depends on server resources & deployment strategy (e.g., K8s)

User manages scaling aspects

Support

Community-driven (GitHub issues, Slack)

No dedicated enterprise support

10. Conclusion: Weighing Control vs. Operational Overhead

Atlantis offers a powerful, community-supported solution for teams looking to automate their Terraform workflows within the familiar confines of their version control system. Its PR-centric approach enhances collaboration, consistency, and auditability for infrastructure changes. The level of control afforded by its self-hosted nature and extensive configuration options is a significant draw for many.

However, this control comes with the inherent operational responsibilities of deploying, managing, securing, and troubleshooting a critical piece of infrastructure automation tooling. Teams adopting Atlantis should be prepared for the initial setup effort and ongoing maintenance. For organizations that value deep control and have the resources to manage such a system, Atlantis is a very capable choice. For those who might prioritize a more managed experience, reduced setup and operational burden, or integrated enterprise features like advanced policy enforcement, role-based access control, and dedicated support, exploring specialized commercial IaC platforms could present a compelling alternative to the self-managed path. The decision often hinges on balancing the desire for granular control with the total cost of ownership and operational capacity.