Terraform Memory Errors Explained

Fix Terraform and OpenTofu Out of Memory errors (Exit Code 137, OOM). Learn why AWS, Azure, and GCP providers consume so much RAM and how to reduce memory usage

TL;DR: Terraform and OpenTofu OOM errors (Exit Code 137, runtime: out of memory) are almost always caused by three things: massive provider schemas (AWS alone uses 400-800MB+ RAM), too many provider aliases spawning separate processes, and bloated state files. Fix them by: upgrading to TF 1.6+/Tofu 1.7+ for schema caching, using AWS Provider v6.0's resource-level region attribute to eliminate aliases, reducing -parallelism, tuning Go's GOMEMLIMIT environment variable, and splitting monolithic state into smaller workspaces.

When scaling infrastructure, "Out of Memory" (OOM) errors are a common hurdle in DevOps pipelines. Whether you are using HashiCorp Terraform or OpenTofu, these errors can be cryptic and costly.

In this guide, we’ll dive into why these tools consume so much RAM, the specific impact of the main cloud providers, and how to fix these issues.

Why Terraform Memory Issues Happen

Both Terraform and OpenTofu are written in Go. While efficient, their architecture requires loading massive amounts of data into RAM to ensure infrastructure safety.

1. The Dependency Graph

Before a single resource is created, the engine builds a directed acyclic graph (DAG). For a configuration with 1,000 resources, this graph tracks every dependency, variable, and output, consuming hundreds of megabytes of RAM.

2. Provider Schema Loading (The Silent Killer)

Provider schema loading is the leading cause of memory spikes. When you run plan, the CLI loads the entire schema for every provider used. Large providers (like AWS or Azure) have enormous schemas; if you use multiple versions or aliases, each one spawns a separate process with its own memory overhead.

3. State File Bloat

The Terraform state file (.tfstate) is the source of truth. Large, monolithic state files must be fully parsed and held in memory during the "Refreshing State" phase.

Cloud Provider Memory Footprint (2026 Update)

The size of a provider’s schema directly correlates to the number of services it supports. Here is the typical baseline memory consumption per provider instance in 2026:

ProviderBase RAM UsageKey Memory Heavyweights
AWS (v6.x)400MB - 800MB+QuickSight, WAFv2, and SageMaker.
AzureRM (v4.x)500MB - 1GB+Deeply nested objects in App Service and Kubernetes.
Google (v6.x)300MB - 600MBExtensive IAM and GKE resource definitions.

These numbers represent the memory for each provider alias. If you have 10 AWS aliases for different regions, you could easily hit 8GB of RAM before the plan even begins. It's important to note that RAM consumption can change per provider version.

Identifying the Out of Memory Errors

Memory errors rarely say "Out of Memory." Look for these specific codes in your CI/CD logs:

Killed (Exit Code 137): The most common error. It means the OS or Container Orchestrator (like Kubernetes) terminated the process for exceeding its memory limit.

runtime: out of memory: A Go-specific crash where the system refused a memory allocation request.

rpc error: code = Unavailable: This often happens when a provider process crashes due to OOM, leaving the main Terraform binary unable to communicate with it.

How to Fix Terraform Memory Errors

1. Enable Provider Schema Caching

Starting with Terraform v1.6.0 and OpenTofu v1.7.0, the engine introduced internal optimizations to handle provider schemas more efficiently. To fully leverage this and avoid redundant binary loading, you should configure a global plugin cache.

By setting a plugin_cache_dir, Terraform/OpenTofu will symlink provider binaries rather than copying them, which stabilizes the initialization phase and reduces the I/O-related memory spikes.

To enable this, create or edit your CLI config file (~/.terraformrc or ~/.tofurc):

# ~/.terraformrc or ~/.tofurc
plugin_cache_dir = "$HOME/.terraform.d/plugin-cache"

Alternatively, set it as an environment variable in your CI/CD pipeline:

export TF_PLUGIN_CACHE_DIR="$HOME/.terraform.d/plugin-cache"

2. Use AWS "Enhanced Region Support"

The AWS Provider v6.0 revolutionized multi-region management. You can now define the region at the resource level rather than creating a new provider alias for every region.

  • Before: 20 aliases = 20 processes (~8GB+ RAM).
  • After: 1 provider = 1 process (~800MB RAM).

Previously, every region required its own provider alias, each spawning a heavy process. With Enhanced Region Support, you can now use a single provider process and specify the region directly inside the resource block.

Here is an example of the "old way":

# Primary Provider
provider "aws" {
  region = "us-east-1"
}

# Secondary Provider (Creates a second OS process)
provider "aws" {
  alias  = "west"
  region = "us-west-2"
}

resource "aws_vpc" "east" {
  cidr_block = "10.0.0.0/16"
}

resource "aws_vpc" "west" {
  provider   = aws.west  # Links to second process
  cidr_block = "10.1.0.0/16"
}

Memory Impact: ~1.2GB RAM (2 processes)

By using the region attribute at the resource level, Terraform Core sends the request to the same provider process, which dynamically switches context.

provider "aws" {
  region = "us-east-1" # Default region
}

# This resource uses the provider's default region
resource "aws_vpc" "east" {
  cidr_block = "10.0.0.0/16"
}

# This resource overrides the region at the resource level
# NO ALIAS OR EXTRA PROVIDER PROCESS REQUIRED
resource "aws_vpc" "west" {
  region     = "us-west-2" 
  cidr_block = "10.1.0.0/16"
}

Memory Impact: ~600MB RAM (1 process)

3. Reduce Parallelism

If your runner is struggling, trade speed for stability. By default, Terraform/OpenTofu runs 10 concurrent operations. Run with -parallelism=3 to lower the number of concurrent schema calls and resource states held in memory.

4. Split Monolithic State Files

If your state file is over 50MB, it’s a sign to refactor. Split your infrastructure into logical "stacks" (e.g., network, data-layer, app-layer). Smaller graphs lead to faster runs and a significantly lower memory ceiling.

Using a TACO Platform

The fixes above address the symptoms, but if you're running Terraform in a CI/CD pipeline like GitHub Actions or GitLab CI, there's a fundamental constraint: you don't control the runner's memory.

A platform like Scalr, a Terraform automation and collaboration platform (TACO), solves this by shifting Terraform execution off your CI/CD runners. Scalr's managed runners are designed to handle the majority of deployments out of the box. For configurations with extreme memory requirements, self-hosted agents let you deploy runners on your own VMs, Docker containers, or Kubernetes clusters and size them to match your workload. No more Exit Code 137 errors because a shared CI/CD runner ran out of memory.

Scalr also makes the "split your state" advice actionable. Each workspace manages its own isolated state file, so breaking a monolith into logical stacks translates directly into separate workspaces with smaller dependency graphs and lower memory ceilings. Run triggers handle the orchestration between them automatically, and unlimited concurrency on paid plans means those split workspaces run in parallel rather than queuing up.

Summary Checklist

SymptomProbable CauseQuick Fix
Exit Code 137Container Memory LimitIncrease Runner RAM or use -parallelism
Long Init/Plan timesSchema LoadingUpgrade to TF 1.6+ / Tofu 1.7+
OOM on Multi-RegionToo many aliasesUse AWS Provider 6.0+ region attributes