Terraform Pull Request Automation for Beginners

Getting started with Terraform? Learn how pull-request automation adds tests, policy checks and previews so beginners merge infrastructure code safely.

Sebastian Stadil

23 May 2025 • 5 min read

The Reality of Terraform at Scale

Here's what actually happens: Your team starts with local Terraform runs. Everything's fine until someone overwrites production. You add basic CI checks. Then state conflicts emerge. You implement Atlantis. It works great... until it doesn't.

The pattern is predictable because infrastructure complexity grows exponentially, not linearly. What works for 10 engineers breaks at 50. What works at 50 becomes a nightmare at 200.

Stage 1: Manual Coordination (1-10 Engineers)

Small teams operate on trust. You've got a shared AWS account, maybe two environments, and everyone knows what everyone else is doing. Sort of.

The Setup

# backend.tf - The classic S3 backend everyone starts with
terraform {
  backend "s3" {
    bucket = "company-terraform-state"
    key    = "production/terraform.tfstate"
    region = "us-east-1"
  }
}

What Actually Happens

# Developer A at 2:47 PM
$ terraform apply
Acquiring state lock...

# Developer B at 2:48 PM
$ terraform apply
Error: Error acquiring the state lock
Another process is already holding a lock on the state.

You implement basic PR checks:

# .github/workflows/terraform.yml
name: Terraform CI
on:
  pull_request:
    paths:
      - '**.tf'

jobs:
  validate:
    runs-on: ubuntu-latest
    steps:
      - uses: actions/checkout@v3
      - uses: hashicorp/setup-terraform@v2
      
      - name: Terraform Init
        run: terraform init
        
      - name: Terraform Validate
        run: terraform validate
        
      - name: Terraform Plan
        run: terraform plan -out=tfplan
        env:
          AWS_ACCESS_KEY_ID: ${{ secrets.AWS_ACCESS_KEY_ID }}
          AWS_SECRET_ACCESS_KEY: ${{ secrets.AWS_SECRET_ACCESS_KEY }}

This works until someone comments "LGTM" on a PR that creates 47 expensive EC2 instances. Time for actual automation.

Stage 2: Basic Automation with Atlantis (10-50 Engineers)

Atlantis changes the game. No more local applies. Everything happens in pull requests. It feels like magic.

Setting Up Atlantis

# atlantis.yaml
version: 3
projects:
- name: production
  dir: environments/prod
  terraform_version: v1.5.0
  autoplan:
    when_modified: ["*.tf", "*.tfvars"]
    enabled: true
  apply_requirements: ["approved", "mergeable"]
  
- name: staging
  dir: environments/staging
  terraform_version: v1.5.0
  autoplan:
    when_modified: ["*.tf", "*.tfvars"]
    enabled: true

The Workflow That Actually Works (For Now)

# Developer creates PR
# Atlantis automatically comments:

# Ran Plan for 2 projects:
# 
# 1. project: `production` dir: `environments/prod` workspace: `default`
# 2. project: `staging` dir: `environments/staging` workspace: `default`
# 
# ### 1. project: `production`
# ```
# Terraform will perform the following actions:
# 
#   + aws_instance.app_server
#       ami:           "ami-0c55b159cbfafe1f0"
#       instance_type: "t3.medium"
# ```

Teams love it. PR reviews include infrastructure changes. The audit trail exists. But then growth happens.

Stage 3: Governance Requirements Emerge (50-200 Engineers)

At 50 engineers, you've got multiple teams. The platform team wants governance. The security team wants compliance. The finance team wants to know why the AWS bill doubled.

Policy as Code Becomes Non-Negotiable

# policies/cost_control.rego
package terraform.policies.cost_control

import future.keywords.if
import input.tfplan as tfplan

deny[msg] {
    r := tfplan.resource_changes[_]
    r.type == "aws_instance"
    r.change.after.instance_type == "m5.24xlarge"
    
    msg := sprintf(
        "Instance type %s requires VP approval. Use m5.xlarge or smaller.",
        [r.change.after.instance_type]
    )
}

deny[msg] {
    cost := sum([cost | 
        r := tfplan.resource_changes[_]
        cost := instance_cost(r)
    ])
    
    cost > 1000
    
    msg := sprintf(
        "Total monthly cost increase $%d exceeds $1000 limit",
        [cost]
    )
}

The Atlantis Integration That Starts to Crack

# atlantis.yaml with custom workflows
workflows:
  policy-check:
    plan:
      steps:
      - init
      - plan
      - run: |
          # This gets messy fast
          conftest verify --policy policies/ $PLANFILE
    apply:
      steps:
      - run: |
          # Hope the policies still pass?
          conftest verify --policy policies/ $PLANFILE
      - apply

You start hitting walls. Atlantis processes one run at a time. Your 15-minute deploys become 2-hour queues. The single Atlantis server becomes a single point of failure.

Stage 4: Enterprise Scale Operations (200+ Engineers)

Large organizations need more than automation—they need a platform. Multiple cloud accounts. Regulatory compliance. Self-service for developers. Governance for security.

What Enterprise Teams Actually Need

# modules/governed-vpc/main.tf
# This module enforces company standards

variable "environment" {
  type = string
  validation {
    condition = contains(["prod", "staging", "dev"], var.environment)
    error_message = "Environment must be prod, staging, or dev"
  }
}

variable "cost_center" {
  type = string
  validation {
    condition = can(regex("^CC-[0-9]{4}$", var.cost_center))
    error_message = "Cost center must match pattern CC-XXXX"
  }
}

locals {
  required_tags = {
    Environment = var.environment
    CostCenter  = var.cost_center
    ManagedBy   = "Terraform"
    Team        = data.scalr_identity.current.email
    # This doesn't work in Atlantis without custom tooling
  }
}

Self-Service That Actually Scales

# scalr-module-registry.yaml
# Teams consume approved modules without knowing the details

modules:
  - name: eks-cluster
    source: terraform-aws-modules/eks/aws
    version_constraint: "~> 19.0"
    
    variable_overrides:
      cluster_endpoint_public_access: false  # Security requirement
      enable_irsa: true                       # Always enabled
      
    policy_sets:
      - eks-security-baseline
      - cost-management
      - tagging-standards

When Atlantis Hits the Wall

Let me be direct about when Atlantis stops being the answer. It's not about Atlantis being bad—it revolutionized Terraform workflows. But architecture decisions made for simplicity become limitations at scale.

The Performance Cliff

# What actually happens in your deployment queue
# (Not actual Atlantis code, but the effect)

queue = [
  {"team": "platform", "duration": 15, "started": "14:00"},
  {"team": "backend", "duration": 20, "waiting": True},
  {"team": "frontend", "duration": 5, "waiting": True},
  {"team": "data", "duration": 45, "waiting": True},
  {"team": "platform", "duration": 10, "waiting": True},  # Yes, same team waiting on itself
]

# Total time: 95 minutes for work that could parallelize to 45

The Hidden Costs

Cost Factor	Atlantis	Enterprise Platform (e.g., Scalr)
Licensing	$0	$500-2000/month
Engineering (maintenance)	40-60 hours/month	0 hours
Engineering (features)	80+ hours for RBAC/policies	Built-in
Downtime risk	High (single server)	SLA guaranteed
Compliance features	DIY everything	SOC2, HIPAA ready
True monthly cost	~$10,000	$500-2000

Comparing Enterprise Solutions

When you outgrow Atlantis, the market offers several paths. Each has its sweet spot.

Terraform Cloud

Pros: Native HashiCorp integration, strong brand recognition
Cons: Unpredictable pricing (billed per resource), recent BSL licensing concerns
Best for: Teams committed to HashiCorp ecosystem despite IBM acquisition

Spacelift

Pros: Multi-IaC support, powerful policy engine
Cons: Complexity can overwhelm smaller teams, premium pricing for features you might not need
Best for: Organizations using multiple IaC tools who need maximum flexibility

env0

Pros: Great UX, strong cost management features, responsive support
Cons: Newer platform, some enterprise features still maturing
Best for: Teams prioritizing developer experience and cost visibility

Scalr

Pros: Purpose-built for enterprise Terraform, managed service model, hierarchical organizations
Cons: Terraform/OpenTofu focus (not multi-IaC), requires commitment to structured workflows
Best for: Enterprises wanting Terraform done right without operational overhead

Making the Migration Decision

Here's the framework that actually works:

Immediate Migration Triggers

migration_required:
  - deployment_delays > 30 minutes
  - availability_requirements > 99%
  - compliance_audit == "failed"
  - on_call_incidents.tool_related > 2/month

Strategic Migration Indicators

consider_migration:
  - terraform_developers > 5
  - environments > 3
  - teams.count > 2
  - monthly_maintenance_hours > 20
  - custom_rbac_needed == true

A Real Migration Timeline

gantt
    title Atlantis to Scalr Migration
    dateFormat  YYYY-MM-DD
    
    section Assessment
    Current state audit           :done, 2024-01-01, 7d
    Requirements gathering        :done, 2024-01-08, 7d
    
    section Pilot
    Dev environment migration     :active, 2024-01-15, 14d
    Policy configuration         :active, 2024-01-22, 7d
    Team training               :2024-01-29, 7d
    
    section Production
    Staging migration           :2024-02-05, 14d
    Production migration        :2024-02-19, 14d
    Atlantis decommission      :2024-03-05, 7d

Summary: Right Tool, Right Time

The evolution from manual Terraform to enterprise platforms isn't about good tools versus bad tools. It's about matching capabilities to requirements.

Stage	Team Size	Right Tool	Monthly Cost	Key Trigger for Next Stage
Manual	1-10	GitHub Actions + S3	~$0	State conflicts, deployment inconsistency
Basic Automation	10-50	Atlantis	~$500 (hosting)	Queueing delays, governance needs
Governance Required	50-200	Scalr/env0	$500-1500	Compliance, multi-cloud, enterprise features
Enterprise Scale	200+	Scalr/Spacelift	$1500+	Complex hierarchies, self-service platform

The pattern is clear: start simple, adopt Atlantis when coordination becomes painful, then migrate to an enterprise platform when governance and scale demand it.

For most organizations hitting the 50+ engineer mark, Scalr represents the sweet spot—enterprise capabilities without enterprise complexity. It's purpose-built for Terraform, eliminates operational overhead, and provides the governance features that become non-negotiable as you grow.

The question isn't whether you'll need enterprise Terraform management. It's whether you'll recognize the need before it becomes a crisis.