How to Structure Terraform
Learn the proven directory layout, module strategy & state management to keep Terraform code scalable, reusable, and team-friendly in any cloud.
Terraform has become the lingua franca for Infrastructure as Code (IaC). Yet, as many who frequent forums like Reddit can attest, initial enthusiasm often gives way to "Terraform sprawl." What starts simple can quickly become a complex web of configurations, especially as you scale and manage multiple environments. For production workloads, a haphazard approach isn't just inefficient—it's risky. This post offers a matter-of-fact guide to structuring your Terraform for production, ensuring stability, security, and maintainability.
The Usual Suspects: Common Terraform Growing Pains
If you're wrestling with Terraform organization, you're not alone. Common pain points include:
- Directory & Repository Chaos: The "monorepo vs. polyrepo" debate rages on, and deciding how to group configurations (by environment? by component?) is a frequent stumbling block.
- Module Mysteries: Designing truly reusable modules—not too thin, not too complex—and managing their versions and outputs often causes confusion.
- Multi-Environment Muddle: Choosing between Terraform workspaces and directory-based separation, managing
.tfvars
files effectively, and ensuring state isolation are persistent challenges. - Scaling Nightmares: Simple setups crumble under pressure, leading to slow
plan
/apply
times, increased blast radius for errors, and general management overhead. - Code Duplication & Drift: The copy-paste anti-pattern leads to inconsistencies and a codebase that no longer reflects reality.
Strategies for Production-Ready Terraform
Moving to a production-grade IaC setup requires deliberate choices. Here’s how to approach it:
1. Strategic Code Structuring
Clarity starts with how you lay out your code.
- Repository Strategy (Monorepo vs. Polyrepo):
- Monorepo: Can simplify internal dependencies and consistency. Requires robust CI/CD tooling for selective builds.
- Polyrepo: Offers clear ownership and faster individual builds but makes cross-repo dependency management harder.
- The Verdict: The "best" choice is contextual. However, tools that offer centralized management can ease the pain regardless of the underlying repo structure.
Directory Layout: For production, directory-based separation (e.g., environments/production/networking/
) is generally preferred over Terraform workspaces due to better isolation and flexibility in backend/provider configurations.
├── environments
│ ├── development
│ │ ├── backend.tf
│ │ ├── main.tf
│ │ └── dev.tfvars
│ ├── staging
│ │ ├── backend.tf
│ │ ├── main.tf
│ │ └── staging.tfvars
│ └── production
│ ├── backend.tf
│ ├── main.tf
│ └── prod.tfvars
├── modules
│ ├── vpc
│ │ ├── main.tf
│ │ ├── variables.tf
│ │ └── outputs.tf
│ └── ec2_instance
│ ├── main.tf
│ ├── variables.tf
│ └── outputs.tf
2. Mastering Reusable Modules
Modules are your best friends for DRY (Don't Repeat Yourself) infrastructure.
- Principles:
- Focus: Modules should have a clear, single purpose.
- Clear Interface: Well-defined input variables (
variables.tf
) and outputs (outputs.tf
). - Documentation: A
README.md
explaining purpose, inputs, outputs, and usage. - Avoid Thin Wrappers: Modules should add value, not just wrap a single resource.
Example Module (modules/vpc/variables.tf
):
variable "project_name" {
description = "The name of the project."
type = string
}
variable "cidr_block" {
description = "The CIDR block for the VPC."
type = string
default = "10.0.0.0/16"
}
variable "enable_dns_hostnames" {
description = "Enable DNS hostnames in the VPC."
type = bool
default = true
}
And a snippet from modules/vpc/main.tf
:
resource "aws_vpc" "main" {
cidr_block = var.cidr_block
enable_dns_hostnames = var.enable_dns_hostnames
tags = {
Name = "${var.project_name}-vpc"
Project = var.project_name
}
}
3. Robust Environment Configuration & Promotion
- Secure Secret Management: NEVER commit secrets to Git. Use dedicated secret managers (e.g., HashiCorp Vault, AWS Secrets Manager, Azure Key Vault) and integrate them with your CI/CD pipeline or Terraform provider. Platforms often provide secure ways to inject these.
.tfvars
for Environment Specifics: Use files like production.tfvars
for non-sensitive, environment-specific values.
// environments/production/prod.tfvars
aws_region = "us-east-1"
instance_count = 5
instance_type = "m5.large"
4. Bulletproof State Management
State is critical. Protect it.
- State Splitting: Break down state by environment, region, and component to reduce blast radius and improve performance. For example,
production/networking/terraform.tfstate
,production/app-main/terraform.tfstate
.
Remote Backends: Always use remote backends (e.g., AWS S3, Azure Blob, Google Cloud Storage) with state locking (e.g., DynamoDB for S3) and versioning enabled.
# environments/production/backend.tf
terraform {
backend "s3" {
bucket = "my-tf-state-bucket-prod"
key = "production/networking/terraform.tfstate"
region = "us-east-1"
dynamodb_table = "my-tf-state-lock-prod"
encrypt = true
}
}
5. Automation and Governance: CI/CD and Policy as Code (PaC)
- CI/CD Pipelines: Automate
terraform init
,validate
,plan
, andapply
. Key stages include:- Checkout
- Initialize & Validate
- Security Scan (e.g., Checkov, tfsec)
- Policy as Code Check (e.g., OPA, Sentinel)
- Plan (output for review)
- Manual Approval (for production)
- Apply
- Policy as Code (PaC): Use tools like Open Policy Agent (OPA) with Rego or HashiCorp Sentinel to enforce security, compliance, and cost policies automatically.
Example OPA/Rego Policy (Ensure S3 buckets have encryption):
package terraform.analysis
deny[msg] {
bucket = input.resource_changes[_]
bucket.type == "aws_s3_bucket"
bucket.mode == "managed"
not bucket.change.after.server_side_encryption_configuration
msg = sprintf("S3 Bucket '%s' must have server-side encryption enabled.", [bucket.name])
}
The Power of Specialized Tooling & Platforms
While tools like Terragrunt help keep configurations DRY and Atlantis automates PR workflows, managing the entire lifecycle—state, CI/CD, RBAC, PaC, module registry, cost estimation—can lead to building a "DIY mini-TACO" (Terraform Automation and Collaboration Software). This is where comprehensive platforms shine.
Platforms like Scalr are designed to address these challenges holistically. They provide:
- Hierarchical Configuration Management: Define settings (variables, provider configs, policies) at different scopes (e.g., global, environment, workspace) with inheritance, drastically reducing boilerplate and enforcing consistency.
- Integrated OPA Policies: Natively enforce policies across your organization.
- RBAC and Environment Management: Securely manage who can do what, where.
- Module Registry: Share and version your internal modules.
- Self-Hosted Agents & Cloud-Based Execution: Flexibility in how and where your Terraform runs.
By centralizing these aspects, such platforms reduce the operational burden and allow teams to focus on delivering value, not just wrestling with tooling.
Summary: Key Organizational Choices
Area | Common Pitfall | Production Best Practice | Scalr Approach Highlight |
---|---|---|---|
Code Structure | Monolithic, hard-to-navigate code | Directory-based separation (env/component), Reusable Modules | Hierarchical configuration simplifies management across diverse structures. |
State Management | Local state, no locking, large state files | Remote backend, locking, versioning, State Splitting (env, region, component) | Securely managed state, facilitates best practices. |
Environment Config | Inconsistent variables, secrets in Git |
| Environment-scoped variables and secure secret integration. |
Modules | Duplication, "thin wrappers," poor versioning | Focused, well-documented, versioned modules | Integrated Module Registry for private modules. |
Automation/Governance | Manual applies, no policy checks | CI/CD pipelines, GitOps, Policy as Code (OPA/Sentinel), Manual Approvals for Prod | Built-in OPA support, customizable workflows, RBAC for secure environment progression. |
Tooling | "DIY mini-TACO" complexity, tool sprawl | Terragrunt for DRY, Atlantis for PRs, or a comprehensive TACO | Provides an integrated platform reducing the need to stitch together multiple tools. |
Conclusion
Organizing Terraform for production is non-trivial but essential. It demands a shift from ad-hoc scripting to disciplined software engineering practices. While individual tools can address specific pain points, a comprehensive platform approach, like that offered by Scalr, can significantly streamline operations, enhance security, and ensure governance across your entire IaC landscape. This allows you to scale effectively while keeping your sanity and your production environments stable.