Top 5 Most Common Terragrunt Issues
Discover the 5 Terragrunt pitfalls teams face in 2025—why they occur and fast fixes for state, dependencies, modules & more.
The Performance Wall
Recent Terragrunt versions have introduced significant performance regressions. Version 0.50.15 specifically shows a 15x slowdown in dependency calculations. Here's what this looks like in practice:
# Pre-0.50.15 execution
$ time terragrunt run-all plan
real 0m32.451s
# Post-0.50.15 execution (same infrastructure)
$ time terragrunt run-all plan
real 8m14.892s
The root cause? O(n²) complexity in locals evaluation. Each module re-evaluates all parent locals, creating exponential growth:
# This innocent-looking configuration
include "root" {
path = find_in_parent_folders()
}
locals {
# Gets evaluated once per module, per dependency
common_vars = read_terragrunt_config(find_in_parent_folders("common.hcl"))
region_vars = read_terragrunt_config(find_in_parent_folders("region.hcl"))
env_vars = read_terragrunt_config(find_in_parent_folders("env.hcl"))
}
Memory usage balloons beyond control. A 50-module deployment that previously required 2GB now demands 16GB+ of RAM. The provider cache compounds this - without proper configuration, Terragrunt downloads the 500MB AWS provider separately for each module.
Configuration Complexity and Path Resolution
Path resolution remains Terragrunt's most confusing aspect. Here's a typical failure scenario:
# parent terragrunt.hcl
locals {
config_path = "${get_terragrunt_dir()}/config.yaml"
}
# child terragrunt.hcl
include "root" {
path = find_in_parent_folders()
expose = true
}
locals {
# This breaks with "no such file or directory"
config = yamldecode(file(include.root.locals.config_path))
}
The fix requires understanding Terragrunt's execution context:
# Correct approach
locals {
config_path = "${get_parent_terragrunt_dir()}/config.yaml"
}
Mock outputs create another layer of confusion. Consider this dependency scenario:
dependency "vpc" {
config_path = "../vpc"
mock_outputs = {
vpc_id = "vpc-fake123"
}
mock_outputs_merge_strategy_with_state = "shallow"
}
# During destroy, this still tries to use the real VPC ID
# Result: "InvalidVpcID.NotFound: The vpc ID 'vpc-real456' does not exist"
CI/CD Integration Challenges
GitHub Actions integration fails with cryptic JSON parsing errors:
# This configuration fails
- uses: hashicorp/setup-terraform@v2
with:
terraform_version: 1.5.0
- run: terragrunt plan
# Error: invalid character 'c' looking for beginning of value
The solution requires disabling Terraform's wrapper:
- uses: hashicorp/setup-terraform@v2
with:
terraform_version: 1.5.0
terraform_wrapper: false # Critical for Terragrunt
Atlantis integration demands custom Docker images:
FROM runatlantis/atlantis:latest
RUN curl -L https://github.com/gruntwork-io/terragrunt/releases/download/v0.50.0/terragrunt_linux_amd64 \
-o /usr/local/bin/terragrunt && \
chmod +x /usr/local/bin/terragrunt
ENV TERRAGRUNT_TFPATH=/usr/local/bin/terraform
Cloud Provider Authentication Issues
AWS cross-account access creates the most friction:
# This fails with "No valid credential sources found"
remote_state {
backend = "s3"
config = {
bucket = "terraform-state-account-b"
key = "${path_relative_to_include()}/terraform.tfstate"
region = "us-east-1"
role_arn = "arn:aws:iam::ACCOUNT_B:role/TerraformRole"
}
}
terraform {
source = "../modules/vpc"
extra_arguments "assume_role" {
commands = get_terraform_commands_that_need_vars()
env_vars = {
AWS_ROLE_ARN = "arn:aws:iam::ACCOUNT_A:role/ResourceRole"
}
}
}
Azure backend configuration can't auto-create resources:
# Must manually create storage account and container first
remote_state {
backend = "azurerm"
config = {
storage_account_name = "tfstate${get_env("TF_VAR_environment", "dev")}"
container_name = "tfstate"
key = "${path_relative_to_include()}/terraform.tfstate"
# This fails if resources don't exist
}
}
State Management at Scale
State locking creates race conditions in parallel execution:
# 10% failure rate with default settings
$ terragrunt run-all apply --terragrunt-parallelism 10
Error: Error locking state: Error acquiring the state lock
ConditionalCheckFailedException: The conditional request failed
The workaround reduces efficiency:
# Forced sequential execution
extra_arguments "serial_locking" {
commands = ["apply", "destroy"]
arguments = ["-parallelism=1"]
}
Summary and Alternatives
Here's what teams face with Terragrunt in 2025:
Issue Category | Impact | Workaround Complexity | Resolution Status |
---|---|---|---|
Performance (v0.50.15+) | 15x slower execution | Medium (downgrade version) | Open - No ETA |
Memory Usage | 16GB+ for 50 modules | High (provider caching) | Partially addressed |
Path Resolution | Daily confusion | High (deep knowledge required) | Documentation only |
CI/CD Integration | 10% failure rate | Medium (custom configs) | Community solutions |
AWS Authentication | 300% overhead | Low (use direct creds) | Won't fix |
State Locking | Race conditions | Medium (reduce parallelism) | Architectural limit |
Azure Support | Manual setup required | High (pre-provisioning) | Feature gap |
Dependency Cycles | Architectural redesign | Very High | By design |
Teams managing 50+ modules report spending 40% of their time debugging Terragrunt issues rather than managing infrastructure. The tool that promised to simplify Terraform has introduced its own complexity layer.
For organizations evaluating alternatives, native Terraform with proper module design handles many use cases without these issues. Platform solutions like Terraform Cloud, Env0, or Scalr provide the orchestration benefits Terragrunt offers - dependency management, environment promotion, policy enforcement - through managed services that avoid these operational headaches.
Scalr particularly excels at the enterprise features teams adopt Terragrunt for: hierarchical variable inheritance, environment management, and policy as code. Unlike Terragrunt's file-based approach that creates path resolution nightmares, Scalr's API-driven model eliminates configuration complexity while providing better performance at scale.
The choice comes down to this: debug Terragrunt's growing list of issues, or invest that time in actual infrastructure work. For teams hitting these walls, the answer becomes clear pretty quickly.