Deep Dive into Custom Atlantis Workflows

Get a step-by-step guide to custom Atlantis workflows that supercharge Terraform automation with policy guardrails, parallel plans and faster approvals.

Alright, so Atlantis. Yeah, it's that open-source thingy a lot of folks use to wrangle Terraform with PRs. GitOps for your infra, yada yada. The out-of-the-box stuff? Sure, it gets you started. But the real juice, the stuff that makes you either a hero or pulls your hair out, that's all buried in atlantis.yaml. We're gonna poke at the gnarly bits: think different envs (dev, stage, prod – oh my!), cramming in security scanners, and just generally trying not to make a total hash of things. 'Cause when things get big, knowing this YAML voodoo is make-or-break. Seriously. It's not always a walk in the park, let me tell you.

1. So, You Think You Know Atlantis? (Intro)

Terraform. It's everywhere for IaC, right? And tools that make it less painful? Gold. Atlantis steps up, sticking Terraform runs right into your Git workflow. plan this, apply that, all from PR comments. It's GitOps, it helps the team, makes auditors happy-ish. Standard stuff.

But. And it's a big but. Your infrastructure isn't going to stay simple, is it? Nope. That's when you need more than the basics. That's when atlantis.yaml slithers into your life. This bad boy lets you build custom workflows. Complex deployment pipelines? Check. Shoving in third-party tools? Check. Making up your own rules? Double-check. Sounds great, and it can be. It also means you're now a YAML janitor, and there's a whole lot more to keep an eye on. Honestly, if you're at a point where your atlantis.yaml looks like a phonebook and you're dreaming in YAML, maybe, just maybe, it's time to look at something more... structured? Platforms like Scalr, for instance, they try to give you the CI/CD for Terraform without you having to become a full-time Atlantis whisperer. Something to chew on.

2. The Belly of the Beast: atlantis.yaml

This file, atlantis.yaml, sitting there at the root of your repo? That's where the magic, or the madness, happens. It tells Atlantis what to look for, what to run, when to run it. Simple, eh?

The Basics: Structure, Top-Level Keys. You know the drill.

Every atlantis.yaml kicks off with version: 3. Don't ask me about versions 1 or 2; 3 is where it's at now. Some main things you'll see at the top:

  • projects: A list. This is where you tell Atlantis, "Hey, these are the Terraform bits I care about." Once you do this, Atlantis stops its little autodiscovery game. You're in charge now.
  • workflows: This is a map. Think of it as a recipe book for custom plan and apply sequences.
  • automerge: True or false. Tells Atlantis to merge the PR if everything goes swimmingly. Risky? Maybe. Convenient? Sometimes.
  • parallel_plan/parallel_apply: True or false. Lets Atlantis try to do many things at once. Speeds things up, if your setup can handle it.

Projects: Getting Specific.

Each thing in that projects list is basically a Terraform setup Atlantis needs to know about.

  • dir: Where's the code? Path to it.
  • workspace: Which Terraform workspace? default if you don't say otherwise.
  • name: Give it a unique name. Super important if you've got multiple projects in the same dir but different workspaces. Otherwise, how will you tell 'em apart?
  • workflow: Got a custom workflow? Point to it here.
  • autoplan: This is about automatically running plan. The when_modified bit here is key.
  • apply_requirements: What needs to be true before Atlantis will even think about applying? approved? mergeable? The list goes on.
version: 3
projects:
  - name: myapp-prod
    dir: terraform/myapp
    workspace: prod
    workflow: prod-workflow
    autoplan:
      when_modified:
        - "**/*.tf"
        - "prod.tfvars"
      enabled: true
    apply_requirements: [approved, mergeable, undiverged]

That undiverged one? Means your PR branch better be up-to-date with main, or no dice.

Custom Workflows: Making Atlantis Dance to Your Tune (Plan, Apply, Mayhem)

This is where you get to boss Atlantis around. Default plan and apply not cutting it? Make your own.

  • Stages: Usually plan and apply. That's the bread and butter.
  • Steps: These can be the built-in Atlantis commands (init, plan, apply, show) or, and this is where it gets fun, custom run commands.

That run step? It's your escape hatch. Run any shell script you want. Atlantis even gives you a bunch of environment variables like $PLANFILE, $WORKSPACE, $PROJECT_NAME, $PULL_NUM. Super handy. But here's the kicker: you're now responsible for those scripts. And making sure all the tools they need are in the Atlantis container. And permissions. It can get hairy.

workflows:
  prod-workflow:
    plan:
      steps:
        - init
        - run: echo "Running pre-production checks... fingers crossed!"
        - plan:
            extra_args: ["-var-file=prod.tfvars", "-out=$PLANFILE"] # Don't forget -out=$PLANFILE!
    apply:
      steps:
        - run: echo "Hold your breath! About to ask for final sign-off for $PROJECT_NAME..."
        # This script better be solid.
        - run: ./scripts/await_manual_approval.sh $PULL_NUM
        - apply
        - run: echo "Phew! $PROJECT_NAME is live. Go check it. Now."

3. Real-World Shenanigans: Advanced Use Cases

So, how does this all play out when things get... complicated?

Juggling Environments: Dev, Staging, Prod. Good luck.

Classic setup: one pile of Terraform code, but you need to send it to dev, then staging, then the big scary prod. Different workspaces, different .tfvars files. Each one of those is its own Atlantis project. And you probably want different rules for each.

atlantis.yaml for the Multi-Env Circus:

version: 3
projects:
  - name: myapp-dev
    dir: terraform/myapp
    workspace: dev
    workflow: dev-workflow # Maybe more relaxed here
    autoplan:
      when_modified: ["**/*.tf", "dev.tfvars", "../../modules/shared/**/*.tf"] # Watch those shared modules!
      enabled: true

  - name: myapp-staging
    dir: terraform/myapp
    workspace: staging
    workflow: staging-workflow
    apply_requirements: [approved] # Getting stricter
    autoplan:
      when_modified: ["**/*.tf", "staging.tfvars", "../../modules/shared/**/*.tf"]
      enabled: true

  - name: myapp-prod
    dir: terraform/myapp
    workspace: prod
    workflow: prod-workflow # Lock this down!
    apply_requirements: [approved, mergeable, undiverged]
    autoplan:
      when_modified: ["**/*.tf", "prod.tfvars", "../../modules/shared/**/*.tf"]
      enabled: true

workflows:
  dev-workflow:
    plan:
      steps:
        - init
        - plan: {extra_args: ["-var-file=../dev.tfvars", "-out=$PLANFILE"]}
    apply:
      steps: [apply] # YOLO for dev? Maybe.

  # staging-workflow might add a linter or two.
  # prod-workflow is where the real fun is.
  prod-workflow:
    plan:
      steps:
        - init
        - run: ./scripts/tfsec_scan.sh . # Security first!
        - plan: {extra_args: ["-var-file=../prod.tfvars", "-out=$PLANFILE"]}
    apply:
      steps:
        - run: ./scripts/prod_approval_gate.sh $PULL_NUM # Someone important needs to say yes.
        - apply
        - run: ./scripts/notify_slack.sh "PROD apply for $PROJECT_NAME done. If it's broken, it's not my PULL_NUM."

Look, this is flexible. Super flexible. But you see all that repetition in the workflows? It can get out of hand. If you're defining nearly identical workflows just to change one variable file or one script, it starts to feel a bit... much. This is one of those spots where I find myself thinking about platforms like Scalr. They often have ways to manage environment progression and variable overrides without making you copy-paste YAML all day. Just a thought.

Plug It In: Custom Scripts, Linters, Security Tools (tfsec, Checkov, the whole gang)

Those run steps are your best friend for shoving in linters and security scanners. If your script exits with anything other than 0, Atlantis slams the brakes. Hard.

Example: tfsec Before You plan

workflows:
  secure-workflow:
    plan:
      steps:
        - init
        - run:
            command: |
              echo "Running tfsec scan... Let's see what horrors await."
              # If tfsec finds something, it'll scream (exit non-zero). Workflow stops.
              tfsec .
            description: "tfsec security scan"
        - plan: {extra_args: ["-out=$PLANFILE"]}
    apply:
      steps: [apply] # Only if the plan (and tfsec) was happy.

Simple enough, but remember: tfsec, Checkov, whatever you're using, has to be there. In the Atlantis Docker image, or wherever Atlantis is running. Managing these tool dependencies is now your job too. Fun!

Conditional Who-Ha: when_modified and those PR Labels

PR Label Games (with Scripts): Atlantis itself doesn't care about your PR labels. But you can make it care. How? A run step with a script that calls the GitHub/GitLab API, checks for labels, and then exits 0 or 1. Your check_label.sh might look a bit like this (conceptually):

#!/bin/bash
# Needs GITHUB_TOKEN, PULL_NUM, all that jazz from Atlantis env vars
REQUIRED_LABEL="ship-it"
# ... magic curl commands to GitHub API ...
if [[ $LABELS_FROM_API == *"$REQUIRED_LABEL"* ]]; then
  echo "Label '$REQUIRED_LABEL' found. Full steam ahead!"
  exit 0 # Go, go, go!
else
  echo "Nope. Label '$REQUIRED_LABEL' is MIA. Stopping this train."
  exit 1 # Halt!
fi

And in your atlantis.yaml:

workflows:
  label-gated-workflow:
    plan:
      steps:
        - run: ./scripts/check_label.sh # The bouncer
        - init
        - plan: {extra_args: ["-out=$PLANFILE"]}

This is clever, right? But also, it's a custom script, talking to an external API, needing a token. More things to manage, more things that can break. Some bigger platforms, again thinking of Scalr here, might have policy engines (like OPA) that can do this kind of conditional logic more natively, without you writing bash scripts to parse JSON from API calls. Food for thought when your check_label.sh becomes check_twenty_labels_and_also_the_weather.sh.

when_modified: This is Atlantis's built-in way to say "only run plan if these specific files changed." Uses glob patterns. Super important in monorepos so you're not planning the entire universe for a typo fix in a README.

autoplan:
  when_modified:
    - "**/*.tf"                     # My stuff
    - "../../modules/network/**/*.tf" # That shared thing everyone touches
    - ".terraform.lock.hcl"        # Always a good idea
  enabled: true

4. Lost in the YAML Woods? Debugging Tips from the Trenches

So, it broke. Of course, it broke. Custom Atlantis workflows are fun like that.

  • Atlantis Server Logs are Your Friend (Mostly): Crank up the logging (--log-level debug). It's noisy, but the clues are usually in there. Somewhere.
  • YAML Linting. Please. Before you even commit atlantis.yaml, lint it. Spaces, not tabs. Indentation. Colons. The usual YAML nightmares. I've lost hours to a single misplaced space. Hours.
  • Custom Script Woes: Check those exit codes. Did your script even run? Does it have permissions? Are all its tools (tfsec, jq, whatever) actually in the Atlantis container? Test your scripts in an environment as close to the Atlantis one as possible. "Works on my machine" is the road to sadness here.
  • when_modified Not Firing? Double-check your glob patterns. Case sensitivity? Path typos? Are you sure that shared module path is right form this project's dir?
  • Server-Side Said "No!": Remember repos.yaml on the Atlantis server? It can override your beautiful atlantis.yaml. If your apply_requirements aren't sticking, check there.

The whole feedback loop for debugging these run steps can be a bit slow. Change script, commit, push, wait for PR, run Atlantis, see it fail, repeat. Sometimes you just want a better way to see what the heck is going on inside that execution step.

5. Keeping Sane: How to Not Drown in atlantis.yaml

If this file gets too big, it's a monster. Here's how to try and tame it.

Monorepos: A Special Kind of Fun

  • Get Granular with Projects: Each thing that can be deployed on its own? Probably its own project entry.
  • when_modified is Your Lifesaver: Get these patterns tight, or you'll be planning everything, all the time. There's also server-side stuff like --autoplan-modules that tries to be smart about dependencies. Sometimes it is.
  • execution_order_group: If project A needs to be applied before project B, this is how you tell Atlantis.
  • Go Parallel: parallel_plan: true and parallel_apply: true can speed things up if you have lots of independent projects.

But yeah, in a big monorepo, atlantis.yaml can still become a beast. I've seen teams write scripts to generate their atlantis.yaml. Think about that for a second. You're writing code to write config for your automation tool. Meta.

YAML Anchors: Your (Sometimes) Friend

YAML has this thing called anchors (&) and aliases (*). Lets you define a chunk of YAML once and reuse it. Good for cutting down on copy-paste.

_common_plan_steps: &common_plan_steps # Define it once
  - init
  - run: ./scripts/lint.sh
  - plan: {extra_args: ["-out=$PLANFILE"]}

workflows:
  my-workflow:
    plan:
      steps:
        - *common_plan_steps # Slap it in here

Helpful, yeah. But go too wild with these, and it can make the YAML harder to read. You end up chasing aliases all over the file. Use with caution.

Server-Side Rules: repos.yaml to the Rescue?

For big shops, repos.yaml on the Atlantis server is non-negotiable. It's how the platform team keeps some control.

  • allowed_overrides: What can the repo-level atlantis.yaml actually change?
  • allowed_workflows: Can't just use any old workflow defined on the server. Only the blessed ones.
  • allow_custom_workflows: true/false: This is a big one for security. If it's true, repos can define run steps that do... well, anything. Default to false unless you really trust everyone and have amazing review processes.

Basically, repos.yaml is how you offer "Atlantis-as-a-service" with guardrails. But managing that central config and how it interacts with all the repo configs? That's another layer of fun. And again, this is where I find myself thinking about systems like Scalr, where RBAC and policy (like OPA) are often baked in, not layered on with more YAML.

6. Quick Recap: What Atlantis Can Actually Do (The Good Parts)

Feature/Component

What It Lets You Do (The Gist)

Key Config Bits

Stuff to Watch Out For / Why It's Tricky

Projects

Tell Atlantis about each bit of your infra

dir, workspace, name

Your atlantis.yaml can get HUGE in monorepos. Like, really big.

Workflows

Make up your own plan/apply steps

workflows.<name>.plan.steps, apply.steps

Now you own the scripts, their tools, their permissions. It's a whole new world of pain/joy.

run Steps

Jam in any CLI tool you want (linters, scanners, etc.)

run: <your_command_here>

Tool has to be in the Atlantis image. Script better be solid. Exit codes matter. A lot.

apply_requirements

Gate apply with PR approvals, merge status, etc.

apply_requirements: [approved]

Depends on your VCS playing nice and PRs being in the right state.

autoplan.when_modified

Only plan if relevant files changed. Magic!

when_modified: ["glob/**/pattern*"]

Getting those glob patterns just right, especially for shared modules, is an art form.

PR Label Logic

(With scripts) Make workflows react to PR labels.

run: ./check_my_labels.sh

More scripting, talking to APIs, managing tokens. Can get complex fast.

YAML Anchors

Don't Repeat Yourself (DRY) for YAML bits.

&myAnchor, *myAnchor

Can make things cleaner, or a confusing mess if you go overboard. Balance, young Padawan.

repos.yaml

Central control from the Atlantis server. The boss.

allowed_overrides, allow_custom_workflows

Needs careful thought about who can do what. And another YAML file to manage.

7. The Big Finish: Scaling Your Terraform Automation (Or Trying To)

So, Atlantis. It's a solid open-source tool for getting Terraform into your PRs. And with custom workflows, you can build some pretty fancy automation pipelines. Security scans, multi-stage deploys, the works.

But, and you knew there was another 'but', all this cool advanced stuff? It means you're signing up for a lot of work. Keeping that atlantis.yaml from becoming a nightmare, making sure your custom scripts don't explode, debugging weird workflow issues, and trying to keep some sort of order across a bunch of repos or a giant monorepo... it's a job. A real job. This isn't a small feet, believe me.

If you're spending more time fighting with Atlantis configuration than actually shipping infrastructure, or if you're looking for something with more batteries-included for governance, security, and just general grown-up operational stuff, maybe it's time to look around. Tools like Scalr, they're built to tackle these kinds of scaling headaches. They offer things like environment management that isn't just more YAML, OPA for policy that isn't just more shell scripts, and RBAC that's, well, actual RBAC. Might save you from writing that script that generates your atlantis.yaml.

End of the day, it's about what fits your team, how much you want to build versus buy, and how much YAML you can stomach. Choose wisely. My coffee's cold.