Atlantis and OpenTofu: Building the Future of Open-Source Infrastructure Automation
This blog post is based on the talk 'Atlantis and OpenTofu: The Future of Open-Source IaC' by PePe Amengual (Slalom) and Dylan-Daniel Page (Lambda Labs) at OpenTofu Day Europe 2024 on March 19, 2024. In this presentation, the Atlantis project maintainers discuss the evolution of one of the most popular open-source infrastructure automation tools, its journey toward vendor neutrality, and how it's embracing the OpenTofu ecosystem.
Understanding Atlantis: The IaC Orchestration Pioneer
Atlantis represents a foundational shift in how engineering teams approach Infrastructure as Code (IaC) workflows. Created in 2015 by Luke Kysow and Anubhav Mishra at Hootsuite, Atlantis pioneered the concept of GitOps-style interactions with Terraform through pull request automation. The tool has since evolved into a critical piece of infrastructure for organizations ranging from small startups to enterprises managing hundreds of repositories and thousands of engineers.
At its core, Atlantis serves as an orchestrator that bridges version control systems with infrastructure provisioning tools. Rather than forcing developers to context-switch between multiple interfaces or run commands locally from their laptops, Atlantis brings the entire workflow into the familiar territory of pull requests. This design philosophy—keeping developers in their native VCS interface—has been instrumental to Atlantis's widespread adoption.
The Atlantis Workflow Model
The fundamental Atlantis workflow follows a simple but powerful pattern:
- Create a Pull Request: A developer opens a PR with infrastructure changes
- Autodiscovery: Atlantis automatically detects Terraform/OpenTofu files
- Plan Generation: Atlantis runs a plan and posts the output as a PR comment
- Review and Approval: Team members review the plan within the PR interface
- Apply Infrastructure: Once approved, a simple comment like
atlantis applyprovisions the infrastructure - Merge and Close: The PR is merged, and optionally the source branch is deleted
This workflow solves several critical problems that plague teams scaling their infrastructure automation. When multiple engineers run Terraform commands locally, coordination becomes a nightmare. Questions like "Who applied what?" and "Why is my state locked?" become daily frustrations. Atlantis eliminates these issues by providing centralized orchestration with full audit trails embedded directly in pull request history.
Atlantis Architecture and Components
Atlantis operates as a self-contained binary that can be deployed as a container on Kubernetes, AWS Fargate, ECS, or any other container platform. The application listens for webhooks from your version control system and responds to events like pull request creation, updates, and comments.
Core Components
The Atlantis architecture consists of three primary configuration layers:
1. Server-Side Configuration (repo.yaml): This file defines the global rules that apply across all repositories. Platform administrators use this to set up:
- Repository patterns and allowed repos
- Pre-workflow and post-workflow hooks
- Policy enforcement rules
- Available workflows that teams can use
- Security and approval requirements
2. Repository Configuration (atlantis.yaml): Individual teams can customize their workflows at the repo level, defining:
- Project definitions and directory structure
- Terraform/OpenTofu versions per environment
- Custom workflows (if allowed by server config)
- Variable injection and environment setup
3. Workflow Customization: Atlantis supports extensive workflow customization through pre and post hooks:
- Pre-workflows: Execute before planning (e.g., dynamic file generation, credential injection, terragrunt discovery)
- Post-workflows: Execute after planning (e.g., cost estimation with Infracost, policy checks with Conftest/OPA, security scanning with tfsec)
VCS Provider Support and Integration
One of Atlantis's strengths is its broad version control system support. The project currently integrates with:
- GitHub (most extensively supported)
- GitLab
- Bitbucket
- Azure DevOps (ADO)
- Gitea (newly added support)
However, Dylan-Daniel Page noted that VCS provider parity remains one of the project's biggest challenges. GitHub, being the most widely used platform among Atlantis adopters, receives the most community contributions and bug fixes. Features like GitHub team-based permissions or organization-level settings may not have equivalents in other providers, leading to documentation and feature gaps.
The maintainers acknowledged that integration testing across all VCS providers is difficult when the core team primarily uses one platform. The majority of Atlantis bugs are VCS-specific, and community contributions are essential for maintaining parity across providers.
Advanced Features and Extensibility
Policy as Code Integration
Atlantis includes native support for policy checks through Conftest, allowing teams to define Open Policy Agent (OPA) rules that validate infrastructure changes before they're applied. This capability enables organizations to enforce standards such as:
- Required tags on all resources
- Approved instance types and sizes
- Network security group rules compliance
- Cost thresholds and budget controls
- Regional restrictions
Policy failures appear directly in the pull request, providing immediate feedback to developers about what needs to be corrected.
API-Driven Automation
In recent years, Atlantis has evolved beyond pure PR-driven workflows with the addition of API endpoints. This opens new automation possibilities:
- Drift Detection: Scheduled jobs can call the API to run plans across all projects, creating PRs automatically when drift is detected
- External Orchestration: Integration with CI/CD pipelines and other automation tools
- Programmatic Infrastructure Management: Scripts and tools can trigger infrastructure operations without manual PR interaction
Extended Commands
Beyond the basic plan and apply commands, Atlantis now supports advanced Terraform operations:
atlantis state rm- Remove resources from stateatlantis state import- Import existing infrastructureatlantis unlock- Release locksatlantis version- Display version information
These commands bring nearly the full Terraform/OpenTofu CLI experience into the pull request interface.
The Journey to CNCF and Open Governance
A significant milestone discussed in the talk was Atlantis's application to join the Cloud Native Computing Foundation (CNCF) as a sandbox project. This represents more than just a badge—it's about establishing proper open-source governance and community stewardship.
PePe Amengual and Dylan-Daniel Page shared their journey to becoming maintainers. PePe got involved three years ago when he needed a specific feature for GitHub groups. Despite having no Go experience, he learned the language, submitted his first pull request, and "made too much noise" in the community—leading to an invitation to become a maintainer. Dylan joined later, recognizing Atlantis's value from his experience at Digital Ocean and Autodesk, where infrastructure management at scale demanded centralized orchestration.
After Luke Kysow joined HashiCorp, there was considerable ambiguity about the project's direction and what changes could be made, especially given Terraform's licensing changes. The donation of Atlantis to the CNCF resolves these concerns, ensuring the project remains truly open-source and vendor-neutral. The maintainers are hopeful for approval in April 2024 when the CNCF sandbox committee meets.
New Governance and Release Process
The new maintainers have implemented several governance improvements:
- Formalized Decision-Making: Architecture Decision Records (ADRs) for major changes
- Improved Release Process: Trunk-based deployment with minor release branches
- Patch Management: Patches stay on their release branch, preventing feature creep in bug fixes
- Regular Maintainer Meetings: Coordinated planning and community engagement
- Clear Documentation: Standards that follow CNCF and open-source best practices
Dylan emphasized that maintaining an open-source project involves far more than just code. Documentation, governance, community coordination, issue triage, and strategic planning consume significant time. Both maintainers balance this volunteer work with their full-time jobs, highlighting the need for more community contributors and maintainers.
OpenTofu Support: Progress and Roadmap
The relationship between Atlantis and OpenTofu represents a natural evolution toward vendor neutrality. Historically, Atlantis was tightly coupled to Terraform, with hardcoded references throughout the codebase. The OpenTofu fork created both a challenge and an opportunity to rethink this approach.
Current OpenTofu Support
Today, teams can use OpenTofu with Atlantis through custom workflows. By overriding the default commands in the Atlantis configuration, users can specify tofu instead of terraform as the binary. This is the same approach used for tools like Terragrunt, demonstrating Atlantis's flexibility.
However, this workaround isn't ideal—it requires manual configuration and doesn't provide the seamless experience that Terraform users enjoy out of the box.
Official OpenTofu Integration Roadmap
The maintainers outlined their plan for first-class OpenTofu support:
- Remove Hardcoded References: Systematic removal of Terraform-specific references in the codebase
- Binary Abstraction: Make the IaC tool binary a configurable option
- Container Image Updates: Ship OpenTofu alongside Terraform in official Atlantis containers
- Auto-Download Support: Implement automatic OpenTofu binary downloading (similar to existing Terraform support)
- Documentation: Clear guides for using OpenTofu with Atlantis
- Testing: Integration tests covering OpenTofu workflows
This work is tracked in a dedicated GitHub issue epic, allowing the community to follow progress and contribute.
Scaling Challenges and Solutions
During the Q&A session, attendees raised important questions about Atlantis at scale. One particularly thorny issue is file locking when multiple engineers work on the same repository simultaneously.
The File Locking Problem
Atlantis currently locks on three dimensions: repository name, pull request number, and workspace. For teams that don't use Terraform workspaces (running everything in the default workspace), this creates bottlenecks. If multiple projects exist in a single PR, they can't run in parallel because they're all locked to the same workspace identifier.
The maintainers are addressing this through ADR #2, which proposes adding project names to the locking cardinality. Once project names are properly tracked in Atlantis's state database, locks can be more granular, allowing parallel operations on different projects within the same PR.
Disable Auto-Plan for High-Volume Teams
Organizations running hundreds of PRs daily often disable Atlantis's auto-plan feature. Instead of automatically running plans on every PR update, developers explicitly trigger plans with commands like atlantis plan -p project-name. This reduces lock contention and gives teams more control over when expensive plan operations execute.
Additionally, some teams disable Atlantis locking entirely, relying instead on the native state locking provided by remote backends like AWS S3 + DynamoDB or Terraform Cloud. This approach requires more team coordination but can be necessary at certain scales.
Architectural Evolution
Looking forward, the maintainers are working to make Atlantis more cloud-native and horizontally scalable:
- Stateless Design: Moving away from the internal BoltDB to external state stores
- Redis Support: Externalized state storage for multi-instance deployments
- Worker/Server Separation: Plans to support distinct server and worker roles using the same binary with different flags
- API-First Architecture: Less dependence on VCS webhooks, more programmatic control
Plugin Architecture and Vendor Neutrality
One of the most significant long-term goals discussed was decoupling VCS providers into a plugin-based architecture. Currently, all VCS integrations live in the core Atlantis codebase, making maintenance difficult and creating a bias toward certain platforms.
The vision is for Atlantis Core to handle orchestration logic while VCS-specific integrations become plugins. This would allow:
- Community-maintained VCS plugins without requiring core maintainer approval
- Faster iteration on provider-specific features
- Easier integration testing
- Support for emerging VCS platforms without core changes
Similarly, the maintainers want to remove bias toward specific IaC tools and policy frameworks. Dylan emphasized that Atlantis should be an orchestrator first—agnostic about whether you use Terraform, OpenTofu, Terragrunt, CDK for Terraform, or any other tool. The current container ships with Terraform and Conftest, but future versions should let users bring their own tools.
Technical Debt and Community Contributions
An audience member asked how the project handles technical debt. Dylan candidly acknowledged this as a major challenge. Over the years, many contributors have added features for their specific use cases and then moved on. When these features break or interact poorly with new changes, there's often no one around who understands the code.
The maintainers are addressing this through:
- Code Quality Tools: Resolving CodeQL issues before the 1.0 release
- Deprecation of Unused Features: Removing configuration flags and features that are no longer necessary
- Better Documentation: Ensuring features are well-documented so others can maintain them
- More Maintainers: Actively seeking additional core maintainers to share the load
PePe noted that they now spend considerable time reviewing every PR for potential regressions, which has slowed the release cadence but improved stability. This careful approach is necessary given how many organizations depend on Atlantis for production infrastructure.
Deployment Patterns: The Chicken and Egg Problem
An interesting question emerged about deploying Atlantis itself using Atlantis to manage Kubernetes clusters—a classic bootstrap problem. The maintainers shared several patterns they've seen:
- Management Cluster First: Deploy a management Kubernetes cluster manually, install Atlantis there, then use it to manage all other clusters
- Standalone Deployment: Run Atlantis on AWS Fargate, ECS, or EC2 outside of Kubernetes
- Initial Manual Setup: Deploy Atlantis manually first, then import it into Atlantis-managed infrastructure for ongoing updates
This is analogous to the state file bootstrap problem—you need to create your first state bucket manually before you can manage infrastructure with Terraform. The same principle applies to Atlantis itself.
The Path Forward
Before reaching the 1.0 milestone, the Atlantis project has several key objectives:
- Clean up and remove deprecated configuration flags
- Resolve remaining CodeQL quality issues
- Fix the file locking regression for parallel plans and applies
- Implement the new release process (completed)
- Achieve official OpenTofu support
Beyond 1.0, the roadmap includes:
- Decoupling VCS providers into plugins
- More cloud-native architecture with multi-server support
- Complete vendor neutrality (no hardcoded tools in the container)
- Improved policy tooling abstraction
- Better horizontal scaling capabilities
Conclusion: Community-Driven Infrastructure Automation
The Atlantis project stands at an exciting inflection point. After years of community-driven development under somewhat uncertain governance, it's moving toward proper open-source stewardship under the CNCF. The embrace of OpenTofu represents a broader commitment to vendor neutrality and ensuring that teams can choose the best tools for their needs without lock-in.
For organizations already using Atlantis, the roadmap promises better scalability, improved VCS parity, and first-class OpenTofu support. For those considering adopting Atlantis, the project offers a mature, battle-tested approach to infrastructure automation that keeps developers in their preferred workflows while providing the centralized orchestration that scale demands.
The maintainers' call for additional contributors and maintainers highlights a universal truth about open-source infrastructure: these tools are built and maintained by volunteers who balance this work with their day jobs. If your organization depends on Atlantis, contributing back—whether through code, documentation, bug reports, or financial support—helps ensure the project's long-term sustainability.
As PePe and Dylan demonstrated in their talk, Atlantis is more than a tool—it's a community building the future of open-source infrastructure automation together.