Deep Dive into Custom Atlantis Workflows
Get a step-by-step guide to custom Atlantis workflows that supercharge Terraform automation with policy guardrails, parallel plans and faster approvals.
Alright, so Atlantis. Yeah, it's that open-source thingy a lot of folks use to wrangle Terraform with PRs. GitOps for your infra, yada yada. The out-of-the-box stuff? Sure, it gets you started. But the real juice, the stuff that makes you either a hero or pulls your hair out, that's all buried in atlantis.yaml
. We're gonna poke at the gnarly bits: think different envs (dev, stage, prod – oh my!), cramming in security scanners, and just generally trying not to make a total hash of things. 'Cause when things get big, knowing this YAML voodoo is make-or-break. Seriously. It's not always a walk in the park, let me tell you.
1. So, You Think You Know Atlantis? (Intro)
Terraform. It's everywhere for IaC, right? And tools that make it less painful? Gold. Atlantis steps up, sticking Terraform runs right into your Git workflow. plan
this, apply
that, all from PR comments. It's GitOps, it helps the team, makes auditors happy-ish. Standard stuff.
But. And it's a big but. Your infrastructure isn't going to stay simple, is it? Nope. That's when you need more than the basics. That's when atlantis.yaml
slithers into your life. This bad boy lets you build custom workflows. Complex deployment pipelines? Check. Shoving in third-party tools? Check. Making up your own rules? Double-check. Sounds great, and it can be. It also means you're now a YAML janitor, and there's a whole lot more to keep an eye on. Honestly, if you're at a point where your atlantis.yaml
looks like a phonebook and you're dreaming in YAML, maybe, just maybe, it's time to look at something more... structured? Platforms like Scalr, for instance, they try to give you the CI/CD for Terraform without you having to become a full-time Atlantis whisperer. Something to chew on.
2. The Belly of the Beast: atlantis.yaml
This file, atlantis.yaml
, sitting there at the root of your repo? That's where the magic, or the madness, happens. It tells Atlantis what to look for, what to run, when to run it. Simple, eh?
The Basics: Structure, Top-Level Keys. You know the drill.
Every atlantis.yaml
kicks off with version: 3
. Don't ask me about versions 1 or 2; 3 is where it's at now. Some main things you'll see at the top:
projects
: A list. This is where you tell Atlantis, "Hey, these are the Terraform bits I care about." Once you do this, Atlantis stops its little autodiscovery game. You're in charge now.workflows
: This is a map. Think of it as a recipe book for custom plan and apply sequences.automerge
: True or false. Tells Atlantis to merge the PR if everything goes swimmingly. Risky? Maybe. Convenient? Sometimes.parallel_plan
/parallel_apply
: True or false. Lets Atlantis try to do many things at once. Speeds things up, if your setup can handle it.
Projects: Getting Specific.
Each thing in that projects
list is basically a Terraform setup Atlantis needs to know about.
dir
: Where's the code? Path to it.workspace
: Which Terraform workspace?default
if you don't say otherwise.name
: Give it a unique name. Super important if you've got multiple projects in the samedir
but different workspaces. Otherwise, how will you tell 'em apart?workflow
: Got a custom workflow? Point to it here.autoplan
: This is about automatically runningplan
. Thewhen_modified
bit here is key.apply_requirements
: What needs to be true before Atlantis will even think about applying?approved
?mergeable
? The list goes on.
version: 3
projects:
- name: myapp-prod
dir: terraform/myapp
workspace: prod
workflow: prod-workflow
autoplan:
when_modified:
- "**/*.tf"
- "prod.tfvars"
enabled: true
apply_requirements: [approved, mergeable, undiverged]
That undiverged
one? Means your PR branch better be up-to-date with main, or no dice.
Custom Workflows: Making Atlantis Dance to Your Tune (Plan, Apply, Mayhem)
This is where you get to boss Atlantis around. Default plan
and apply
not cutting it? Make your own.
- Stages: Usually
plan
andapply
. That's the bread and butter. - Steps: These can be the built-in Atlantis commands (
init
,plan
,apply
,show
) or, and this is where it gets fun, customrun
commands.
That run
step? It's your escape hatch. Run any shell script you want. Atlantis even gives you a bunch of environment variables like $PLANFILE
, $WORKSPACE
, $PROJECT_NAME
, $PULL_NUM
. Super handy. But here's the kicker: you're now responsible for those scripts. And making sure all the tools they need are in the Atlantis container. And permissions. It can get hairy.
workflows:
prod-workflow:
plan:
steps:
- init
- run: echo "Running pre-production checks... fingers crossed!"
- plan:
extra_args: ["-var-file=prod.tfvars", "-out=$PLANFILE"] # Don't forget -out=$PLANFILE!
apply:
steps:
- run: echo "Hold your breath! About to ask for final sign-off for $PROJECT_NAME..."
# This script better be solid.
- run: ./scripts/await_manual_approval.sh $PULL_NUM
- apply
- run: echo "Phew! $PROJECT_NAME is live. Go check it. Now."
3. Real-World Shenanigans: Advanced Use Cases
So, how does this all play out when things get... complicated?
Juggling Environments: Dev, Staging, Prod. Good luck.
Classic setup: one pile of Terraform code, but you need to send it to dev, then staging, then the big scary prod. Different workspaces, different .tfvars
files. Each one of those is its own Atlantis project
. And you probably want different rules for each.
atlantis.yaml
for the Multi-Env Circus:
version: 3
projects:
- name: myapp-dev
dir: terraform/myapp
workspace: dev
workflow: dev-workflow # Maybe more relaxed here
autoplan:
when_modified: ["**/*.tf", "dev.tfvars", "../../modules/shared/**/*.tf"] # Watch those shared modules!
enabled: true
- name: myapp-staging
dir: terraform/myapp
workspace: staging
workflow: staging-workflow
apply_requirements: [approved] # Getting stricter
autoplan:
when_modified: ["**/*.tf", "staging.tfvars", "../../modules/shared/**/*.tf"]
enabled: true
- name: myapp-prod
dir: terraform/myapp
workspace: prod
workflow: prod-workflow # Lock this down!
apply_requirements: [approved, mergeable, undiverged]
autoplan:
when_modified: ["**/*.tf", "prod.tfvars", "../../modules/shared/**/*.tf"]
enabled: true
workflows:
dev-workflow:
plan:
steps:
- init
- plan: {extra_args: ["-var-file=../dev.tfvars", "-out=$PLANFILE"]}
apply:
steps: [apply] # YOLO for dev? Maybe.
# staging-workflow might add a linter or two.
# prod-workflow is where the real fun is.
prod-workflow:
plan:
steps:
- init
- run: ./scripts/tfsec_scan.sh . # Security first!
- plan: {extra_args: ["-var-file=../prod.tfvars", "-out=$PLANFILE"]}
apply:
steps:
- run: ./scripts/prod_approval_gate.sh $PULL_NUM # Someone important needs to say yes.
- apply
- run: ./scripts/notify_slack.sh "PROD apply for $PROJECT_NAME done. If it's broken, it's not my PULL_NUM."
Look, this is flexible. Super flexible. But you see all that repetition in the workflows? It can get out of hand. If you're defining nearly identical workflows just to change one variable file or one script, it starts to feel a bit... much. This is one of those spots where I find myself thinking about platforms like Scalr. They often have ways to manage environment progression and variable overrides without making you copy-paste YAML all day. Just a thought.
Plug It In: Custom Scripts, Linters, Security Tools (tfsec, Checkov, the whole gang)
Those run
steps are your best friend for shoving in linters and security scanners. If your script exits with anything other than 0, Atlantis slams the brakes. Hard.
Example: tfsec
Before You plan
workflows:
secure-workflow:
plan:
steps:
- init
- run:
command: |
echo "Running tfsec scan... Let's see what horrors await."
# If tfsec finds something, it'll scream (exit non-zero). Workflow stops.
tfsec .
description: "tfsec security scan"
- plan: {extra_args: ["-out=$PLANFILE"]}
apply:
steps: [apply] # Only if the plan (and tfsec) was happy.
Simple enough, but remember: tfsec
, Checkov
, whatever you're using, has to be there. In the Atlantis Docker image, or wherever Atlantis is running. Managing these tool dependencies is now your job too. Fun!
Conditional Who-Ha: when_modified
and those PR Labels
PR Label Games (with Scripts): Atlantis itself doesn't care about your PR labels. But you can make it care. How? A run
step with a script that calls the GitHub/GitLab API, checks for labels, and then exits 0 or 1. Your check_label.sh
might look a bit like this (conceptually):
#!/bin/bash
# Needs GITHUB_TOKEN, PULL_NUM, all that jazz from Atlantis env vars
REQUIRED_LABEL="ship-it"
# ... magic curl commands to GitHub API ...
if [[ $LABELS_FROM_API == *"$REQUIRED_LABEL"* ]]; then
echo "Label '$REQUIRED_LABEL' found. Full steam ahead!"
exit 0 # Go, go, go!
else
echo "Nope. Label '$REQUIRED_LABEL' is MIA. Stopping this train."
exit 1 # Halt!
fi
And in your atlantis.yaml
:
workflows:
label-gated-workflow:
plan:
steps:
- run: ./scripts/check_label.sh # The bouncer
- init
- plan: {extra_args: ["-out=$PLANFILE"]}
This is clever, right? But also, it's a custom script, talking to an external API, needing a token. More things to manage, more things that can break. Some bigger platforms, again thinking of Scalr here, might have policy engines (like OPA) that can do this kind of conditional logic more natively, without you writing bash scripts to parse JSON from API calls. Food for thought when your check_label.sh
becomes check_twenty_labels_and_also_the_weather.sh
.
when_modified
: This is Atlantis's built-in way to say "only run plan
if these specific files changed." Uses glob patterns. Super important in monorepos so you're not planning the entire universe for a typo fix in a README.
autoplan:
when_modified:
- "**/*.tf" # My stuff
- "../../modules/network/**/*.tf" # That shared thing everyone touches
- ".terraform.lock.hcl" # Always a good idea
enabled: true
4. Lost in the YAML Woods? Debugging Tips from the Trenches
So, it broke. Of course, it broke. Custom Atlantis workflows are fun like that.
- Atlantis Server Logs are Your Friend (Mostly): Crank up the logging (
--log-level debug
). It's noisy, but the clues are usually in there. Somewhere. - YAML Linting. Please. Before you even commit
atlantis.yaml
, lint it. Spaces, not tabs. Indentation. Colons. The usual YAML nightmares. I've lost hours to a single misplaced space. Hours. - Custom Script Woes: Check those exit codes. Did your script even run? Does it have permissions? Are all its tools (tfsec, jq, whatever) actually in the Atlantis container? Test your scripts in an environment as close to the Atlantis one as possible. "Works on my machine" is the road to sadness here.
when_modified
Not Firing? Double-check your glob patterns. Case sensitivity? Path typos? Are you sure that shared module path is right form this project'sdir
?- Server-Side Said "No!": Remember
repos.yaml
on the Atlantis server? It can override your beautifulatlantis.yaml
. If yourapply_requirements
aren't sticking, check there.
The whole feedback loop for debugging these run
steps can be a bit slow. Change script, commit, push, wait for PR, run Atlantis, see it fail, repeat. Sometimes you just want a better way to see what the heck is going on inside that execution step.
5. Keeping Sane: How to Not Drown in atlantis.yaml
If this file gets too big, it's a monster. Here's how to try and tame it.
Monorepos: A Special Kind of Fun
- Get Granular with Projects: Each thing that can be deployed on its own? Probably its own
project
entry. when_modified
is Your Lifesaver: Get these patterns tight, or you'll be planning everything, all the time. There's also server-side stuff like--autoplan-modules
that tries to be smart about dependencies. Sometimes it is.execution_order_group
: If project A needs to be applied before project B, this is how you tell Atlantis.- Go Parallel:
parallel_plan: true
andparallel_apply: true
can speed things up if you have lots of independent projects.
But yeah, in a big monorepo, atlantis.yaml
can still become a beast. I've seen teams write scripts to generate their atlantis.yaml
. Think about that for a second. You're writing code to write config for your automation tool. Meta.
YAML Anchors: Your (Sometimes) Friend
YAML has this thing called anchors (&
) and aliases (*
). Lets you define a chunk of YAML once and reuse it. Good for cutting down on copy-paste.
_common_plan_steps: &common_plan_steps # Define it once
- init
- run: ./scripts/lint.sh
- plan: {extra_args: ["-out=$PLANFILE"]}
workflows:
my-workflow:
plan:
steps:
- *common_plan_steps # Slap it in here
Helpful, yeah. But go too wild with these, and it can make the YAML harder to read. You end up chasing aliases all over the file. Use with caution.
Server-Side Rules: repos.yaml
to the Rescue?
For big shops, repos.yaml
on the Atlantis server is non-negotiable. It's how the platform team keeps some control.
allowed_overrides
: What can the repo-levelatlantis.yaml
actually change?allowed_workflows
: Can't just use any old workflow defined on the server. Only the blessed ones.allow_custom_workflows: true/false
: This is a big one for security. If it'strue
, repos can definerun
steps that do... well, anything. Default tofalse
unless you really trust everyone and have amazing review processes.
Basically, repos.yaml
is how you offer "Atlantis-as-a-service" with guardrails. But managing that central config and how it interacts with all the repo configs? That's another layer of fun. And again, this is where I find myself thinking about systems like Scalr, where RBAC and policy (like OPA) are often baked in, not layered on with more YAML.
6. Quick Recap: What Atlantis Can Actually Do (The Good Parts)
Feature/Component | What It Lets You Do (The Gist) | Key Config Bits | Stuff to Watch Out For / Why It's Tricky |
---|---|---|---|
Projects | Tell Atlantis about each bit of your infra |
| Your |
Workflows | Make up your own plan/apply steps |
| Now you own the scripts, their tools, their permissions. It's a whole new world of pain/joy. |
| Jam in any CLI tool you want (linters, scanners, etc.) |
| Tool has to be in the Atlantis image. Script better be solid. Exit codes matter. A lot. |
| Gate |
| Depends on your VCS playing nice and PRs being in the right state. |
| Only plan if relevant files changed. Magic! |
| Getting those glob patterns just right, especially for shared modules, is an art form. |
PR Label Logic | (With scripts) Make workflows react to PR labels. |
| More scripting, talking to APIs, managing tokens. Can get complex fast. |
YAML Anchors | Don't Repeat Yourself (DRY) for YAML bits. |
| Can make things cleaner, or a confusing mess if you go overboard. Balance, young Padawan. |
| Central control from the Atlantis server. The boss. |
| Needs careful thought about who can do what. And another YAML file to manage. |
7. The Big Finish: Scaling Your Terraform Automation (Or Trying To)
So, Atlantis. It's a solid open-source tool for getting Terraform into your PRs. And with custom workflows, you can build some pretty fancy automation pipelines. Security scans, multi-stage deploys, the works.
But, and you knew there was another 'but', all this cool advanced stuff? It means you're signing up for a lot of work. Keeping that atlantis.yaml
from becoming a nightmare, making sure your custom scripts don't explode, debugging weird workflow issues, and trying to keep some sort of order across a bunch of repos or a giant monorepo... it's a job. A real job. This isn't a small feet, believe me.
If you're spending more time fighting with Atlantis configuration than actually shipping infrastructure, or if you're looking for something with more batteries-included for governance, security, and just general grown-up operational stuff, maybe it's time to look around. Tools like Scalr, they're built to tackle these kinds of scaling headaches. They offer things like environment management that isn't just more YAML, OPA for policy that isn't just more shell scripts, and RBAC that's, well, actual RBAC. Might save you from writing that script that generates your atlantis.yaml
.
End of the day, it's about what fits your team, how much you want to build versus buy, and how much YAML you can stomach. Choose wisely. My coffee's cold.