Terraform State Lock Errors: Emergency Solutions & Prevention Guide

Fix Terraform state lock errors fast and prevent them: emergency unlock steps, root causes, and best practices to avoid future lockouts.

Use Ctrl+F to go straight to your issue.

1. Emergency Unlock Commands (Copy-Paste Ready)

Terraform Cloud Backend

Error Message:

Error: Error acquiring the state lock
Error message: resource temporarily unavailable
Lock Info:
  ID: 12345abc-6789-def0-1234-56789abcdef0
  Path: <workspace-name>
  Operation: OperationTypePlan
  Who: [email protected]
  Version: 1.5.0
  Created: 2024-11-23 15:30:45 +0000 UTC

Solution 1: CLI Force Unlock

# Navigate to your Terraform configuration directory
cd /path/to/terraform/config

# Force unlock using the Lock ID from error message
terraform force-unlock 12345abc-6789-def0-1234-56789abcdef0

# Skip confirmation prompt
terraform force-unlock -force 12345abc-6789-def0-1234-56789abcdef0

Solution 2: Terraform Cloud API

# Set environment variables
export TOKEN="your-terraform-cloud-token"
export WORKSPACE_ID="ws-ABC123DEF456"

# Force unlock via API
curl \
  --header "Authorization: Bearer $TOKEN" \
  --header "Content-Type: application/vnd.api+json" \
  --request POST \
  https://app.terraform.io/api/v2/workspaces/$WORKSPACE_ID/actions/unlock

S3 Backend with DynamoDB

Error Message:

Error: Error locking state: Error acquiring the state lock: 
ConditionalCheckFailedException: The conditional request failed
Lock Info:
  ID: terraform-s3-bucket/path/to/terraform.tfstate-md5
  Path: terraform-s3-bucket/path/to/terraform.tfstate
  Operation: OperationTypePlan
  Who: root@runner-123-concurrent-0

Solution 1: Terraform CLI Force Unlock

# Must be in terraform directory
cd /path/to/your/terraform/project

# Try CLI unlock first
terraform force-unlock 12345abc-6789-def0-1234-56789abcdef0

Solution 2: Manual DynamoDB Lock Removal

# Find your DynamoDB table name
grep -A 10 'backend "s3"' *.tf | grep dynamodb_table

# List current locks
aws dynamodb scan \
  --table-name terraform-state-lock \
  --region us-east-1

# Delete specific lock
aws dynamodb delete-item \
  --table-name terraform-state-lock \
  --key '{"LockID": {"S": "your-bucket/path/terraform.tfstate"}}' \
  --region us-east-1

Emergency Break-Glass Procedure:

# Kill all terraform processes
pkill -f terraform

# Remove local lock files
rm -f .terraform/.terraform.tfstate.lock.info
rm -f .terraform.tfstate.lock.info

# Verify no active operations
ps aux | grep terraform

2. Finding Hung Processes Guide

Linux/Mac Commands

Find Terraform Processes:

# List all Terraform processes
ps aux | grep terraform | grep -v grep

# Find processes with PIDs
pgrep -fl terraform

# Check for specific state locks
lsof | grep terraform.tfstate
lsof | grep .terraform

# Find zombie processes
ps aux | awk '$8 ~ /^[Zz]/ { print $2 " " $11 }'

Kill Hung Processes:

# Safe termination (try first)
kill <PID>
pkill terraform

# Force kill (if normal kill fails)
kill -9 <PID>
pkill -9 terraform

# Kill all terraform processes
pkill -f terraform
pkill -f "terraform apply"
pkill -f "terraform plan"

Windows Commands

Find Terraform Processes:

# Command Prompt
tasklist | findstr terraform
tasklist /FI "IMAGENAME eq terraform.exe"

# PowerShell
Get-Process | Where-Object {$_.ProcessName -like "*terraform*"}
Get-Process terraform*

Kill Hung Processes:

# Command Prompt
taskkill /PID <PID> /F
taskkill /IM terraform.exe /F

# PowerShell
Stop-Process -Name "terraform" -Force
Stop-Process -Id <PID> -Force
Get-Process terraform* | Stop-Process -Force

PowerShell Script for Long-Running Processes:

# Find and kill terraform processes running > 30 minutes
Get-Process terraform* -ErrorAction SilentlyContinue | 
    Where-Object {$_.TotalProcessorTime.TotalMinutes -gt 30} | 
    Stop-Process -Force

3. DynamoDB Lock Investigation (S3 Backend)

View Lock Details

Check Lock Table Structure:

# Describe table
aws dynamodb describe-table --table-name terraform-state-lock

# Scan all locks
aws dynamodb scan --table-name terraform-state-lock --region us-east-1

# Pretty print lock information
aws dynamodb scan --table-name terraform-state-lock \
    --query 'Items[*].{LockID:LockID.S,Info:Info.S}' \
    --output table

Find Specific Locks:

# Search for locks by state path
aws dynamodb scan --table-name terraform-state-lock \
    --filter-expression "contains(LockID, :state_path)" \
    --expression-attribute-values '{"state_path": {"S": "your-project/terraform.tfstate"}}' \
    --region us-east-1

Batch Lock Cleanup Script

#!/bin/bash
# cleanup_stale_locks.sh

TABLE_NAME="terraform-state-lock"
REGION="us-east-1"
CUTOFF_DATE="2024-01-01T00:00:00Z"

# Get all locks older than cutoff date
aws dynamodb scan \
    --table-name "$TABLE_NAME" \
    --region "$REGION" \
    --output json | \
    jq -r --arg cutoff "$CUTOFF_DATE" \
    '.Items[] | select((.Info.S | fromjson).Created < $cutoff) | .LockID.S' | \
while read -r lock_id; do
    echo "Deleting stale lock: $lock_id"
    aws dynamodb delete-item \
        --table-name "$TABLE_NAME" \
        --key "{\"LockID\": {\"S\": \"$lock_id\"}}" \
        --region "$REGION"
done

4. Preventing Concurrent Executions

Pre-Check Script

#!/bin/bash
# pre-check-lock.sh - Detect existing locks before running

MAX_WAIT_TIME=1800  # 30 minutes
CHECK_INTERVAL=30   # 30 seconds
WAITED_TIME=0

echo "🔍 Checking for existing Terraform state locks..."

while [ $WAITED_TIME -lt $MAX_WAIT_TIME ]; do
    if timeout 10s terraform plan -lock-timeout=5s -detailed-exitcode >/dev/null 2>&1; then
        echo "✅ No lock detected, proceeding"
        exit 0
    else
        echo "⏳ State locked, waiting ${CHECK_INTERVAL}s... (${WAITED_TIME}/${MAX_WAIT_TIME}s)"
        sleep $CHECK_INTERVAL
        WAITED_TIME=$((WAITED_TIME + CHECK_INTERVAL))
    fi
done

echo "❌ Timeout waiting for state lock"
exit 1

Lock Timeout Best Practices

# Always use lock timeouts
terraform plan -lock-timeout=10m
terraform apply -lock-timeout=15m
terraform destroy -lock-timeout=20m

# Enable DynamoDB TTL for auto-expiry
aws dynamodb update-time-to-live \
  --table-name terraform-state-lock \
  --time-to-live-specification Enabled=true,AttributeName=ExpirationTime \
  --region us-east-1

5. GitLab/GitHub Actions Mutex Examples

GitLab CI with Resource Groups

# .gitlab-ci.yml - Prevents concurrent Terraform runs
stages:
  - validate
  - plan
  - apply

variables:
  TF_IN_AUTOMATION: "true"

# Plan with environment-specific locking
terraform_plan:
  stage: plan
  resource_group: ${CI_ENVIRONMENT_NAME}_terraform
  environment:
    name: ${CI_ENVIRONMENT_NAME}
  script:
    - terraform init
    - terraform plan -out=plan.tfplan -lock-timeout=10m
  artifacts:
    paths:
      - plan.tfplan
    expire_in: 1 hour

# Apply with strict locking
terraform_apply:
  stage: apply
  resource_group: ${CI_ENVIRONMENT_NAME}_terraform
  environment:
    name: ${CI_ENVIRONMENT_NAME}
  script:
    - terraform apply -input=false plan.tfplan
  dependencies:
    - terraform_plan
  when: manual
  retry:
    max: 2
    when:
      - runner_system_failure
      - stuck_or_timeout_failure

# Multiple environments with isolated locks
.env_template: &env_template
  resource_group: ${ENVIRONMENT}_terraform_deployment
  variables:
    CI_ENVIRONMENT_NAME: ${ENVIRONMENT}

terraform_dev:
  <<: *env_template
  variables:
    ENVIRONMENT: "dev"

terraform_prod:
  <<: *env_template
  variables:
    ENVIRONMENT: "prod"
  when: manual

GitHub Actions with Concurrency Groups

name: "Terraform Infrastructure"

on:
  push:
    branches: [main]
  pull_request:
    branches: [main]

# Global concurrency control
concurrency:
  group: terraform-${{ github.workflow }}-${{ github.ref }}
  cancel-in-progress: ${{ github.ref != 'refs/heads/main' }}

jobs:
  plan:
    name: "Terraform Plan"
    runs-on: ubuntu-latest
    strategy:
      matrix:
        environment: [dev, staging, prod]
      max-parallel: 1  # Sequential execution
    concurrency:
      group: terraform-${{ matrix.environment }}
      cancel-in-progress: false  # Never cancel terraform operations
    steps:
      - uses: actions/checkout@v4
      
      - name: Setup Terraform
        uses: hashicorp/setup-terraform@v3
        with:
          terraform_version: "1.6.0"
      
      - name: Terraform Init
        run: |
          terraform init \
            -backend-config="bucket=${{ secrets.TF_STATE_BUCKET }}" \
            -backend-config="key=${{ matrix.environment }}/terraform.tfstate"
      
      - name: Terraform Plan
        run: terraform plan -lock-timeout=10m -var-file="envs/${{ matrix.environment }}.tfvars"

  apply:
    name: "Terraform Apply"
    needs: plan
    runs-on: ubuntu-latest
    if: github.ref == 'refs/heads/main'
    strategy:
      matrix:
        environment: [dev, staging, prod]
      max-parallel: 1
    concurrency:
      group: terraform-apply-${{ matrix.environment }}
      cancel-in-progress: false
    environment:
      name: ${{ matrix.environment }}
    steps:
      - uses: actions/checkout@v4
      
      - name: Setup Terraform
        uses: hashicorp/setup-terraform@v3
      
      - name: Terraform Apply
        run: |
          terraform init \
            -backend-config="bucket=${{ secrets.TF_STATE_BUCKET }}" \
            -backend-config="key=${{ matrix.environment }}/terraform.tfstate"
          terraform apply -auto-approve -lock-timeout=15m \
            -var-file="envs/${{ matrix.environment }}.tfvars"

6. State Lock Monitoring Setup

CloudWatch Monitoring for DynamoDB

Lambda Function for Lock Age Monitoring:

import boto3
import json
from datetime import datetime, timezone
import os

def lambda_handler(event, context):
    dynamodb = boto3.client('dynamodb')
    cloudwatch = boto3.client('cloudwatch')
    
    table_name = os.environ['LOCK_TABLE_NAME']
    stale_threshold = int(os.environ.get('STALE_THRESHOLD_MINUTES', 30))
    
    # Scan for active locks
    response = dynamodb.scan(TableName=table_name)
    current_time = datetime.now(timezone.utc)
    stale_locks = []
    
    for item in response['Items']:
        if 'Info' in item:
            lock_info = json.loads(item['Info']['S'])
            created_time = datetime.fromisoformat(
                lock_info['Created'].replace('Z', '+00:00')
            )
            lock_age_minutes = (current_time - created_time).total_seconds() / 60
            
            # Send metric
            cloudwatch.put_metric_data(
                Namespace='Terraform/StateLocks',
                MetricData=[{
                    'MetricName': 'LockAge',
                    'Value': lock_age_minutes,
                    'Unit': 'Count'
                }]
            )
            
            if lock_age_minutes > stale_threshold:
                stale_locks.append({
                    'lock_id': item['LockID']['S'],
                    'age_minutes': lock_age_minutes,
                    'who': lock_info.get('Who', 'Unknown')
                })
    
    # Send stale lock count
    cloudwatch.put_metric_data(
        Namespace='Terraform/StateLocks',
        MetricData=[{
            'MetricName': 'StaleLockCount',
            'Value': len(stale_locks),
            'Unit': 'Count'
        }]
    )
    
    return {'statusCode': 200, 'stale_locks': stale_locks}

CloudWatch Alarms

# Terraform configuration for alarms
resource "aws_cloudwatch_metric_alarm" "stale_locks" {
  alarm_name          = "terraform-stale-locks"
  comparison_operator = "GreaterThanThreshold"
  evaluation_periods  = "2"
  metric_name         = "StaleLockCount"
  namespace           = "Terraform/StateLocks"
  period              = "300"
  statistic           = "Maximum"
  threshold           = "0"
  alarm_description   = "Terraform state locks are stale"
  alarm_actions       = [aws_sns_topic.terraform_alerts.arn]
}

resource "aws_cloudwatch_metric_alarm" "long_running_locks" {
  alarm_name          = "terraform-long-running-locks"
  comparison_operator = "GreaterThanThreshold"
  evaluation_periods  = "1"
  metric_name         = "LockAge"
  namespace           = "Terraform/StateLocks"
  period              = "300"
  statistic           = "Maximum"
  threshold           = "60"  # 60 minutes
  alarm_description   = "Terraform locks running longer than expected"
  alarm_actions       = [aws_sns_topic.terraform_alerts.arn]
}

Automated Lock Cleanup Script

#!/bin/bash
# Safe automated cleanup with validation

LOCK_TABLE="${LOCK_TABLE:-terraform-state-lock}"
STALE_THRESHOLD_MINUTES="${STALE_THRESHOLD_MINUTES:-60}"
DRY_RUN="${DRY_RUN:-true}"

get_stale_locks() {
    aws dynamodb scan \
        --table-name "$LOCK_TABLE" \
        --output json | \
    jq -r --arg threshold "$STALE_THRESHOLD_MINUTES" '
        .Items[] |
        select(.Info.S != null) |
        .Info.S as $info |
        ($info | fromjson) as $lock_data |
        select((now - ($lock_data.Created | strptime("%Y-%m-%dT%H:%M:%S.%fZ") | mktime)) > ($threshold | tonumber * 60)) |
        {
            lock_id: .LockID.S,
            age_minutes: ((now - ($lock_data.Created | strptime("%Y-%m-%dT%H:%M:%S.%fZ") | mktime)) / 60 | floor)
        }
    '
}

cleanup_stale_lock() {
    local lock_id="$1"
    
    if [[ "$DRY_RUN" == "true" ]]; then
        echo "DRY RUN: Would delete lock $lock_id"
        return 0
    fi
    
    aws dynamodb delete-item \
        --table-name "$LOCK_TABLE" \
        --key "{\"LockID\":{\"S\":\"$lock_id\"}}"
}

# Main execution
echo "Starting lock cleanup (threshold: $STALE_THRESHOLD_MINUTES min)"
get_stale_locks | while read -r lock_info; do
    lock_id=$(echo "$lock_info" | jq -r '.lock_id')
    cleanup_stale_lock "$lock_id"
done

Key Takeaways

For Immediate Issues:

  1. Always try terraform force-unlock with the lock ID first
  2. Use DynamoDB manual deletion as a last resort
  3. Kill hung processes before attempting unlock

For Prevention:

  1. Implement CI/CD mutex patterns (resource_group in GitLab, concurrency in GitHub)
  2. Always use -lock-timeout flags (10-15 minutes recommended)
  3. Set up automated monitoring and cleanup for stale locks
  4. Enable DynamoDB TTL for automatic lock expiration

Critical Safety Rules:

  • Never force unlock if another process is actively running
  • Always backup state before manual interventions
  • Coordinate with team when performing emergency unlocks
  • Monitor lock age to catch issues early