Terraform Pull Request Automation for Beginners

Getting started with Terraform? Learn how pull-request automation adds tests, policy checks and previews so beginners merge infrastructure code safely.

The Reality of Terraform at Scale

Here's what actually happens: Your team starts with local Terraform runs. Everything's fine until someone overwrites production. You add basic CI checks. Then state conflicts emerge. You implement Atlantis. It works great... until it doesn't.

The pattern is predictable because infrastructure complexity grows exponentially, not linearly. What works for 10 engineers breaks at 50. What works at 50 becomes a nightmare at 200.

Stage 1: Manual Coordination (1-10 Engineers)

Small teams operate on trust. You've got a shared AWS account, maybe two environments, and everyone knows what everyone else is doing. Sort of.

The Setup

# backend.tf - The classic S3 backend everyone starts with
terraform {
  backend "s3" {
    bucket = "company-terraform-state"
    key    = "production/terraform.tfstate"
    region = "us-east-1"
  }
}

What Actually Happens

# Developer A at 2:47 PM
$ terraform apply
Acquiring state lock...

# Developer B at 2:48 PM
$ terraform apply
Error: Error acquiring the state lock
Another process is already holding a lock on the state.

You implement basic PR checks:

# .github/workflows/terraform.yml
name: Terraform CI
on:
  pull_request:
    paths:
      - '**.tf'

jobs:
  validate:
    runs-on: ubuntu-latest
    steps:
      - uses: actions/checkout@v3
      - uses: hashicorp/setup-terraform@v2
      
      - name: Terraform Init
        run: terraform init
        
      - name: Terraform Validate
        run: terraform validate
        
      - name: Terraform Plan
        run: terraform plan -out=tfplan
        env:
          AWS_ACCESS_KEY_ID: ${{ secrets.AWS_ACCESS_KEY_ID }}
          AWS_SECRET_ACCESS_KEY: ${{ secrets.AWS_SECRET_ACCESS_KEY }}

This works until someone comments "LGTM" on a PR that creates 47 expensive EC2 instances. Time for actual automation.

Stage 2: Basic Automation with Atlantis (10-50 Engineers)

Atlantis changes the game. No more local applies. Everything happens in pull requests. It feels like magic.

Setting Up Atlantis

# atlantis.yaml
version: 3
projects:
- name: production
  dir: environments/prod
  terraform_version: v1.5.0
  autoplan:
    when_modified: ["*.tf", "*.tfvars"]
    enabled: true
  apply_requirements: ["approved", "mergeable"]
  
- name: staging
  dir: environments/staging
  terraform_version: v1.5.0
  autoplan:
    when_modified: ["*.tf", "*.tfvars"]
    enabled: true

The Workflow That Actually Works (For Now)

# Developer creates PR
# Atlantis automatically comments:

# Ran Plan for 2 projects:
# 
# 1. project: `production` dir: `environments/prod` workspace: `default`
# 2. project: `staging` dir: `environments/staging` workspace: `default`
# 
# ### 1. project: `production`
# ```
# Terraform will perform the following actions:
# 
#   + aws_instance.app_server
#       ami:           "ami-0c55b159cbfafe1f0"
#       instance_type: "t3.medium"
# ```

Teams love it. PR reviews include infrastructure changes. The audit trail exists. But then growth happens.

Stage 3: Governance Requirements Emerge (50-200 Engineers)

At 50 engineers, you've got multiple teams. The platform team wants governance. The security team wants compliance. The finance team wants to know why the AWS bill doubled.

Policy as Code Becomes Non-Negotiable

# policies/cost_control.rego
package terraform.policies.cost_control

import future.keywords.if
import input.tfplan as tfplan

deny[msg] {
    r := tfplan.resource_changes[_]
    r.type == "aws_instance"
    r.change.after.instance_type == "m5.24xlarge"
    
    msg := sprintf(
        "Instance type %s requires VP approval. Use m5.xlarge or smaller.",
        [r.change.after.instance_type]
    )
}

deny[msg] {
    cost := sum([cost | 
        r := tfplan.resource_changes[_]
        cost := instance_cost(r)
    ])
    
    cost > 1000
    
    msg := sprintf(
        "Total monthly cost increase $%d exceeds $1000 limit",
        [cost]
    )
}

The Atlantis Integration That Starts to Crack

# atlantis.yaml with custom workflows
workflows:
  policy-check:
    plan:
      steps:
      - init
      - plan
      - run: |
          # This gets messy fast
          conftest verify --policy policies/ $PLANFILE
    apply:
      steps:
      - run: |
          # Hope the policies still pass?
          conftest verify --policy policies/ $PLANFILE
      - apply

You start hitting walls. Atlantis processes one run at a time. Your 15-minute deploys become 2-hour queues. The single Atlantis server becomes a single point of failure.

Stage 4: Enterprise Scale Operations (200+ Engineers)

Large organizations need more than automation—they need a platform. Multiple cloud accounts. Regulatory compliance. Self-service for developers. Governance for security.

What Enterprise Teams Actually Need

# modules/governed-vpc/main.tf
# This module enforces company standards

variable "environment" {
  type = string
  validation {
    condition = contains(["prod", "staging", "dev"], var.environment)
    error_message = "Environment must be prod, staging, or dev"
  }
}

variable "cost_center" {
  type = string
  validation {
    condition = can(regex("^CC-[0-9]{4}$", var.cost_center))
    error_message = "Cost center must match pattern CC-XXXX"
  }
}

locals {
  required_tags = {
    Environment = var.environment
    CostCenter  = var.cost_center
    ManagedBy   = "Terraform"
    Team        = data.scalr_identity.current.email
    # This doesn't work in Atlantis without custom tooling
  }
}

Self-Service That Actually Scales

# scalr-module-registry.yaml
# Teams consume approved modules without knowing the details

modules:
  - name: eks-cluster
    source: terraform-aws-modules/eks/aws
    version_constraint: "~> 19.0"
    
    variable_overrides:
      cluster_endpoint_public_access: false  # Security requirement
      enable_irsa: true                       # Always enabled
      
    policy_sets:
      - eks-security-baseline
      - cost-management
      - tagging-standards

When Atlantis Hits the Wall

Let me be direct about when Atlantis stops being the answer. It's not about Atlantis being bad—it revolutionized Terraform workflows. But architecture decisions made for simplicity become limitations at scale.

The Performance Cliff

# What actually happens in your deployment queue
# (Not actual Atlantis code, but the effect)

queue = [
  {"team": "platform", "duration": 15, "started": "14:00"},
  {"team": "backend", "duration": 20, "waiting": True},
  {"team": "frontend", "duration": 5, "waiting": True},
  {"team": "data", "duration": 45, "waiting": True},
  {"team": "platform", "duration": 10, "waiting": True},  # Yes, same team waiting on itself
]

# Total time: 95 minutes for work that could parallelize to 45

The Hidden Costs

Cost Factor Atlantis Enterprise Platform (e.g., Scalr)
Licensing $0 $500-2000/month
Engineering (maintenance) 40-60 hours/month 0 hours
Engineering (features) 80+ hours for RBAC/policies Built-in
Downtime risk High (single server) SLA guaranteed
Compliance features DIY everything SOC2, HIPAA ready
True monthly cost ~$10,000 $500-2000

Comparing Enterprise Solutions

When you outgrow Atlantis, the market offers several paths. Each has its sweet spot.

Terraform Cloud

  • Pros: Native HashiCorp integration, strong brand recognition
  • Cons: Unpredictable pricing (billed per resource), recent BSL licensing concerns
  • Best for: Teams committed to HashiCorp ecosystem despite IBM acquisition

Spacelift

  • Pros: Multi-IaC support, powerful policy engine
  • Cons: Complexity can overwhelm smaller teams, premium pricing for features you might not need
  • Best for: Organizations using multiple IaC tools who need maximum flexibility

env0

  • Pros: Great UX, strong cost management features, responsive support
  • Cons: Newer platform, some enterprise features still maturing
  • Best for: Teams prioritizing developer experience and cost visibility

Scalr

  • Pros: Purpose-built for enterprise Terraform, managed service model, hierarchical organizations
  • Cons: Terraform/OpenTofu focus (not multi-IaC), requires commitment to structured workflows
  • Best for: Enterprises wanting Terraform done right without operational overhead

Making the Migration Decision

Here's the framework that actually works:

Immediate Migration Triggers

migration_required:
  - deployment_delays > 30 minutes
  - availability_requirements > 99%
  - compliance_audit == "failed"
  - on_call_incidents.tool_related > 2/month

Strategic Migration Indicators

consider_migration:
  - terraform_developers > 5
  - environments > 3
  - teams.count > 2
  - monthly_maintenance_hours > 20
  - custom_rbac_needed == true

A Real Migration Timeline

gantt
    title Atlantis to Scalr Migration
    dateFormat  YYYY-MM-DD
    
    section Assessment
    Current state audit           :done, 2024-01-01, 7d
    Requirements gathering        :done, 2024-01-08, 7d
    
    section Pilot
    Dev environment migration     :active, 2024-01-15, 14d
    Policy configuration         :active, 2024-01-22, 7d
    Team training               :2024-01-29, 7d
    
    section Production
    Staging migration           :2024-02-05, 14d
    Production migration        :2024-02-19, 14d
    Atlantis decommission      :2024-03-05, 7d

Summary: Right Tool, Right Time

The evolution from manual Terraform to enterprise platforms isn't about good tools versus bad tools. It's about matching capabilities to requirements.

Stage Team Size Right Tool Monthly Cost Key Trigger for Next Stage
Manual 1-10 GitHub Actions + S3 ~$0 State conflicts, deployment inconsistency
Basic Automation 10-50 Atlantis ~$500 (hosting) Queueing delays, governance needs
Governance Required 50-200 Scalr/env0 $500-1500 Compliance, multi-cloud, enterprise features
Enterprise Scale 200+ Scalr/Spacelift $1500+ Complex hierarchies, self-service platform

The pattern is clear: start simple, adopt Atlantis when coordination becomes painful, then migrate to an enterprise platform when governance and scale demand it.

For most organizations hitting the 50+ engineer mark, Scalr represents the sweet spot—enterprise capabilities without enterprise complexity. It's purpose-built for Terraform, eliminates operational overhead, and provides the governance features that become non-negotiable as you grow.

The question isn't whether you'll need enterprise Terraform management. It's whether you'll recognize the need before it becomes a crisis.