Top 5 Most Common Terragrunt Issues

Discover the 5 Terragrunt pitfalls teams face in 2025—why they occur and fast fixes for state, dependencies, modules & more.

The Performance Wall

Recent Terragrunt versions have introduced significant performance regressions. Version 0.50.15 specifically shows a 15x slowdown in dependency calculations. Here's what this looks like in practice:

# Pre-0.50.15 execution
$ time terragrunt run-all plan
real    0m32.451s

# Post-0.50.15 execution (same infrastructure)
$ time terragrunt run-all plan
real    8m14.892s

The root cause? O(n²) complexity in locals evaluation. Each module re-evaluates all parent locals, creating exponential growth:

# This innocent-looking configuration
include "root" {
  path = find_in_parent_folders()
}

locals {
  # Gets evaluated once per module, per dependency
  common_vars = read_terragrunt_config(find_in_parent_folders("common.hcl"))
  region_vars = read_terragrunt_config(find_in_parent_folders("region.hcl"))
  env_vars    = read_terragrunt_config(find_in_parent_folders("env.hcl"))
}

Memory usage balloons beyond control. A 50-module deployment that previously required 2GB now demands 16GB+ of RAM. The provider cache compounds this - without proper configuration, Terragrunt downloads the 500MB AWS provider separately for each module.

Configuration Complexity and Path Resolution

Path resolution remains Terragrunt's most confusing aspect. Here's a typical failure scenario:

# parent terragrunt.hcl
locals {
  config_path = "${get_terragrunt_dir()}/config.yaml"
}

# child terragrunt.hcl
include "root" {
  path   = find_in_parent_folders()
  expose = true
}

locals {
  # This breaks with "no such file or directory"
  config = yamldecode(file(include.root.locals.config_path))
}

The fix requires understanding Terragrunt's execution context:

# Correct approach
locals {
  config_path = "${get_parent_terragrunt_dir()}/config.yaml"
}

Mock outputs create another layer of confusion. Consider this dependency scenario:

dependency "vpc" {
  config_path = "../vpc"
  
  mock_outputs = {
    vpc_id = "vpc-fake123"
  }
  
  mock_outputs_merge_strategy_with_state = "shallow"
}

# During destroy, this still tries to use the real VPC ID
# Result: "InvalidVpcID.NotFound: The vpc ID 'vpc-real456' does not exist"

CI/CD Integration Challenges

GitHub Actions integration fails with cryptic JSON parsing errors:

# This configuration fails
- uses: hashicorp/setup-terraform@v2
  with:
    terraform_version: 1.5.0

- run: terragrunt plan
# Error: invalid character 'c' looking for beginning of value

The solution requires disabling Terraform's wrapper:

- uses: hashicorp/setup-terraform@v2
  with:
    terraform_version: 1.5.0
    terraform_wrapper: false  # Critical for Terragrunt

Atlantis integration demands custom Docker images:

FROM runatlantis/atlantis:latest
RUN curl -L https://github.com/gruntwork-io/terragrunt/releases/download/v0.50.0/terragrunt_linux_amd64 \
    -o /usr/local/bin/terragrunt && \
    chmod +x /usr/local/bin/terragrunt
ENV TERRAGRUNT_TFPATH=/usr/local/bin/terraform

Cloud Provider Authentication Issues

AWS cross-account access creates the most friction:

# This fails with "No valid credential sources found"
remote_state {
  backend = "s3"
  config = {
    bucket         = "terraform-state-account-b"
    key            = "${path_relative_to_include()}/terraform.tfstate"
    region         = "us-east-1"
    role_arn       = "arn:aws:iam::ACCOUNT_B:role/TerraformRole"
  }
}

terraform {
  source = "../modules/vpc"
  
  extra_arguments "assume_role" {
    commands = get_terraform_commands_that_need_vars()
    env_vars = {
      AWS_ROLE_ARN = "arn:aws:iam::ACCOUNT_A:role/ResourceRole"
    }
  }
}

Azure backend configuration can't auto-create resources:

# Must manually create storage account and container first
remote_state {
  backend = "azurerm"
  config = {
    storage_account_name = "tfstate${get_env("TF_VAR_environment", "dev")}"
    container_name       = "tfstate"
    key                  = "${path_relative_to_include()}/terraform.tfstate"
    # This fails if resources don't exist
  }
}

State Management at Scale

State locking creates race conditions in parallel execution:

# 10% failure rate with default settings
$ terragrunt run-all apply --terragrunt-parallelism 10

Error: Error locking state: Error acquiring the state lock
ConditionalCheckFailedException: The conditional request failed

The workaround reduces efficiency:

# Forced sequential execution
extra_arguments "serial_locking" {
  commands = ["apply", "destroy"]
  arguments = ["-parallelism=1"]
}

Summary and Alternatives

Here's what teams face with Terragrunt in 2025:

Issue Category Impact Workaround Complexity Resolution Status
Performance (v0.50.15+) 15x slower execution Medium (downgrade version) Open - No ETA
Memory Usage 16GB+ for 50 modules High (provider caching) Partially addressed
Path Resolution Daily confusion High (deep knowledge required) Documentation only
CI/CD Integration 10% failure rate Medium (custom configs) Community solutions
AWS Authentication 300% overhead Low (use direct creds) Won't fix
State Locking Race conditions Medium (reduce parallelism) Architectural limit
Azure Support Manual setup required High (pre-provisioning) Feature gap
Dependency Cycles Architectural redesign Very High By design

Teams managing 50+ modules report spending 40% of their time debugging Terragrunt issues rather than managing infrastructure. The tool that promised to simplify Terraform has introduced its own complexity layer.

For organizations evaluating alternatives, native Terraform with proper module design handles many use cases without these issues. Platform solutions like Terraform Cloud, Env0, or Scalr provide the orchestration benefits Terragrunt offers - dependency management, environment promotion, policy enforcement - through managed services that avoid these operational headaches.

Scalr particularly excels at the enterprise features teams adopt Terragrunt for: hierarchical variable inheritance, environment management, and policy as code. Unlike Terragrunt's file-based approach that creates path resolution nightmares, Scalr's API-driven model eliminates configuration complexity while providing better performance at scale.

The choice comes down to this: debug Terragrunt's growing list of issues, or invest that time in actual infrastructure work. For teams hitting these walls, the answer becomes clear pretty quickly.