Terraform State Lock Errors: Emergency Solutions & Prevention Guide
Fix Terraform state lock errors fast and prevent them: emergency unlock steps, root causes, and best practices to avoid future lockouts.
Use Ctrl+F to go straight to your issue.
1. Emergency Unlock Commands (Copy-Paste Ready)
Terraform Cloud Backend
Error Message:
Error: Error acquiring the state lock
Error message: resource temporarily unavailable
Lock Info:
ID: 12345abc-6789-def0-1234-56789abcdef0
Path: <workspace-name>
Operation: OperationTypePlan
Who: [email protected]
Version: 1.5.0
Created: 2024-11-23 15:30:45 +0000 UTC
Solution 1: CLI Force Unlock
# Navigate to your Terraform configuration directory
cd /path/to/terraform/config
# Force unlock using the Lock ID from error message
terraform force-unlock 12345abc-6789-def0-1234-56789abcdef0
# Skip confirmation prompt
terraform force-unlock -force 12345abc-6789-def0-1234-56789abcdef0
Solution 2: Terraform Cloud API
# Set environment variables
export TOKEN="your-terraform-cloud-token"
export WORKSPACE_ID="ws-ABC123DEF456"
# Force unlock via API
curl \
--header "Authorization: Bearer $TOKEN" \
--header "Content-Type: application/vnd.api+json" \
--request POST \
https://app.terraform.io/api/v2/workspaces/$WORKSPACE_ID/actions/unlock
S3 Backend with DynamoDB
Error Message:
Error: Error locking state: Error acquiring the state lock:
ConditionalCheckFailedException: The conditional request failed
Lock Info:
ID: terraform-s3-bucket/path/to/terraform.tfstate-md5
Path: terraform-s3-bucket/path/to/terraform.tfstate
Operation: OperationTypePlan
Who: root@runner-123-concurrent-0
Solution 1: Terraform CLI Force Unlock
# Must be in terraform directory
cd /path/to/your/terraform/project
# Try CLI unlock first
terraform force-unlock 12345abc-6789-def0-1234-56789abcdef0
Solution 2: Manual DynamoDB Lock Removal
# Find your DynamoDB table name
grep -A 10 'backend "s3"' *.tf | grep dynamodb_table
# List current locks
aws dynamodb scan \
--table-name terraform-state-lock \
--region us-east-1
# Delete specific lock
aws dynamodb delete-item \
--table-name terraform-state-lock \
--key '{"LockID": {"S": "your-bucket/path/terraform.tfstate"}}' \
--region us-east-1
Emergency Break-Glass Procedure:
# Kill all terraform processes
pkill -f terraform
# Remove local lock files
rm -f .terraform/.terraform.tfstate.lock.info
rm -f .terraform.tfstate.lock.info
# Verify no active operations
ps aux | grep terraform
2. Finding Hung Processes Guide
Linux/Mac Commands
Find Terraform Processes:
# List all Terraform processes
ps aux | grep terraform | grep -v grep
# Find processes with PIDs
pgrep -fl terraform
# Check for specific state locks
lsof | grep terraform.tfstate
lsof | grep .terraform
# Find zombie processes
ps aux | awk '$8 ~ /^[Zz]/ { print $2 " " $11 }'
Kill Hung Processes:
# Safe termination (try first)
kill <PID>
pkill terraform
# Force kill (if normal kill fails)
kill -9 <PID>
pkill -9 terraform
# Kill all terraform processes
pkill -f terraform
pkill -f "terraform apply"
pkill -f "terraform plan"
Windows Commands
Find Terraform Processes:
# Command Prompt
tasklist | findstr terraform
tasklist /FI "IMAGENAME eq terraform.exe"
# PowerShell
Get-Process | Where-Object {$_.ProcessName -like "*terraform*"}
Get-Process terraform*
Kill Hung Processes:
# Command Prompt
taskkill /PID <PID> /F
taskkill /IM terraform.exe /F
# PowerShell
Stop-Process -Name "terraform" -Force
Stop-Process -Id <PID> -Force
Get-Process terraform* | Stop-Process -Force
PowerShell Script for Long-Running Processes:
# Find and kill terraform processes running > 30 minutes
Get-Process terraform* -ErrorAction SilentlyContinue |
Where-Object {$_.TotalProcessorTime.TotalMinutes -gt 30} |
Stop-Process -Force
3. DynamoDB Lock Investigation (S3 Backend)
View Lock Details
Check Lock Table Structure:
# Describe table
aws dynamodb describe-table --table-name terraform-state-lock
# Scan all locks
aws dynamodb scan --table-name terraform-state-lock --region us-east-1
# Pretty print lock information
aws dynamodb scan --table-name terraform-state-lock \
--query 'Items[*].{LockID:LockID.S,Info:Info.S}' \
--output table
Find Specific Locks:
# Search for locks by state path
aws dynamodb scan --table-name terraform-state-lock \
--filter-expression "contains(LockID, :state_path)" \
--expression-attribute-values '{"state_path": {"S": "your-project/terraform.tfstate"}}' \
--region us-east-1
Batch Lock Cleanup Script
#!/bin/bash
# cleanup_stale_locks.sh
TABLE_NAME="terraform-state-lock"
REGION="us-east-1"
CUTOFF_DATE="2024-01-01T00:00:00Z"
# Get all locks older than cutoff date
aws dynamodb scan \
--table-name "$TABLE_NAME" \
--region "$REGION" \
--output json | \
jq -r --arg cutoff "$CUTOFF_DATE" \
'.Items[] | select((.Info.S | fromjson).Created < $cutoff) | .LockID.S' | \
while read -r lock_id; do
echo "Deleting stale lock: $lock_id"
aws dynamodb delete-item \
--table-name "$TABLE_NAME" \
--key "{\"LockID\": {\"S\": \"$lock_id\"}}" \
--region "$REGION"
done
4. Preventing Concurrent Executions
Pre-Check Script
#!/bin/bash
# pre-check-lock.sh - Detect existing locks before running
MAX_WAIT_TIME=1800 # 30 minutes
CHECK_INTERVAL=30 # 30 seconds
WAITED_TIME=0
echo "🔍 Checking for existing Terraform state locks..."
while [ $WAITED_TIME -lt $MAX_WAIT_TIME ]; do
if timeout 10s terraform plan -lock-timeout=5s -detailed-exitcode >/dev/null 2>&1; then
echo "✅ No lock detected, proceeding"
exit 0
else
echo "⏳ State locked, waiting ${CHECK_INTERVAL}s... (${WAITED_TIME}/${MAX_WAIT_TIME}s)"
sleep $CHECK_INTERVAL
WAITED_TIME=$((WAITED_TIME + CHECK_INTERVAL))
fi
done
echo "❌ Timeout waiting for state lock"
exit 1
Lock Timeout Best Practices
# Always use lock timeouts
terraform plan -lock-timeout=10m
terraform apply -lock-timeout=15m
terraform destroy -lock-timeout=20m
# Enable DynamoDB TTL for auto-expiry
aws dynamodb update-time-to-live \
--table-name terraform-state-lock \
--time-to-live-specification Enabled=true,AttributeName=ExpirationTime \
--region us-east-1
5. GitLab/GitHub Actions Mutex Examples
GitLab CI with Resource Groups
# .gitlab-ci.yml - Prevents concurrent Terraform runs
stages:
- validate
- plan
- apply
variables:
TF_IN_AUTOMATION: "true"
# Plan with environment-specific locking
terraform_plan:
stage: plan
resource_group: ${CI_ENVIRONMENT_NAME}_terraform
environment:
name: ${CI_ENVIRONMENT_NAME}
script:
- terraform init
- terraform plan -out=plan.tfplan -lock-timeout=10m
artifacts:
paths:
- plan.tfplan
expire_in: 1 hour
# Apply with strict locking
terraform_apply:
stage: apply
resource_group: ${CI_ENVIRONMENT_NAME}_terraform
environment:
name: ${CI_ENVIRONMENT_NAME}
script:
- terraform apply -input=false plan.tfplan
dependencies:
- terraform_plan
when: manual
retry:
max: 2
when:
- runner_system_failure
- stuck_or_timeout_failure
# Multiple environments with isolated locks
.env_template: &env_template
resource_group: ${ENVIRONMENT}_terraform_deployment
variables:
CI_ENVIRONMENT_NAME: ${ENVIRONMENT}
terraform_dev:
<<: *env_template
variables:
ENVIRONMENT: "dev"
terraform_prod:
<<: *env_template
variables:
ENVIRONMENT: "prod"
when: manual
GitHub Actions with Concurrency Groups
name: "Terraform Infrastructure"
on:
push:
branches: [main]
pull_request:
branches: [main]
# Global concurrency control
concurrency:
group: terraform-${{ github.workflow }}-${{ github.ref }}
cancel-in-progress: ${{ github.ref != 'refs/heads/main' }}
jobs:
plan:
name: "Terraform Plan"
runs-on: ubuntu-latest
strategy:
matrix:
environment: [dev, staging, prod]
max-parallel: 1 # Sequential execution
concurrency:
group: terraform-${{ matrix.environment }}
cancel-in-progress: false # Never cancel terraform operations
steps:
- uses: actions/checkout@v4
- name: Setup Terraform
uses: hashicorp/setup-terraform@v3
with:
terraform_version: "1.6.0"
- name: Terraform Init
run: |
terraform init \
-backend-config="bucket=${{ secrets.TF_STATE_BUCKET }}" \
-backend-config="key=${{ matrix.environment }}/terraform.tfstate"
- name: Terraform Plan
run: terraform plan -lock-timeout=10m -var-file="envs/${{ matrix.environment }}.tfvars"
apply:
name: "Terraform Apply"
needs: plan
runs-on: ubuntu-latest
if: github.ref == 'refs/heads/main'
strategy:
matrix:
environment: [dev, staging, prod]
max-parallel: 1
concurrency:
group: terraform-apply-${{ matrix.environment }}
cancel-in-progress: false
environment:
name: ${{ matrix.environment }}
steps:
- uses: actions/checkout@v4
- name: Setup Terraform
uses: hashicorp/setup-terraform@v3
- name: Terraform Apply
run: |
terraform init \
-backend-config="bucket=${{ secrets.TF_STATE_BUCKET }}" \
-backend-config="key=${{ matrix.environment }}/terraform.tfstate"
terraform apply -auto-approve -lock-timeout=15m \
-var-file="envs/${{ matrix.environment }}.tfvars"
6. State Lock Monitoring Setup
CloudWatch Monitoring for DynamoDB
Lambda Function for Lock Age Monitoring:
import boto3
import json
from datetime import datetime, timezone
import os
def lambda_handler(event, context):
dynamodb = boto3.client('dynamodb')
cloudwatch = boto3.client('cloudwatch')
table_name = os.environ['LOCK_TABLE_NAME']
stale_threshold = int(os.environ.get('STALE_THRESHOLD_MINUTES', 30))
# Scan for active locks
response = dynamodb.scan(TableName=table_name)
current_time = datetime.now(timezone.utc)
stale_locks = []
for item in response['Items']:
if 'Info' in item:
lock_info = json.loads(item['Info']['S'])
created_time = datetime.fromisoformat(
lock_info['Created'].replace('Z', '+00:00')
)
lock_age_minutes = (current_time - created_time).total_seconds() / 60
# Send metric
cloudwatch.put_metric_data(
Namespace='Terraform/StateLocks',
MetricData=[{
'MetricName': 'LockAge',
'Value': lock_age_minutes,
'Unit': 'Count'
}]
)
if lock_age_minutes > stale_threshold:
stale_locks.append({
'lock_id': item['LockID']['S'],
'age_minutes': lock_age_minutes,
'who': lock_info.get('Who', 'Unknown')
})
# Send stale lock count
cloudwatch.put_metric_data(
Namespace='Terraform/StateLocks',
MetricData=[{
'MetricName': 'StaleLockCount',
'Value': len(stale_locks),
'Unit': 'Count'
}]
)
return {'statusCode': 200, 'stale_locks': stale_locks}
CloudWatch Alarms
# Terraform configuration for alarms
resource "aws_cloudwatch_metric_alarm" "stale_locks" {
alarm_name = "terraform-stale-locks"
comparison_operator = "GreaterThanThreshold"
evaluation_periods = "2"
metric_name = "StaleLockCount"
namespace = "Terraform/StateLocks"
period = "300"
statistic = "Maximum"
threshold = "0"
alarm_description = "Terraform state locks are stale"
alarm_actions = [aws_sns_topic.terraform_alerts.arn]
}
resource "aws_cloudwatch_metric_alarm" "long_running_locks" {
alarm_name = "terraform-long-running-locks"
comparison_operator = "GreaterThanThreshold"
evaluation_periods = "1"
metric_name = "LockAge"
namespace = "Terraform/StateLocks"
period = "300"
statistic = "Maximum"
threshold = "60" # 60 minutes
alarm_description = "Terraform locks running longer than expected"
alarm_actions = [aws_sns_topic.terraform_alerts.arn]
}
Automated Lock Cleanup Script
#!/bin/bash
# Safe automated cleanup with validation
LOCK_TABLE="${LOCK_TABLE:-terraform-state-lock}"
STALE_THRESHOLD_MINUTES="${STALE_THRESHOLD_MINUTES:-60}"
DRY_RUN="${DRY_RUN:-true}"
get_stale_locks() {
aws dynamodb scan \
--table-name "$LOCK_TABLE" \
--output json | \
jq -r --arg threshold "$STALE_THRESHOLD_MINUTES" '
.Items[] |
select(.Info.S != null) |
.Info.S as $info |
($info | fromjson) as $lock_data |
select((now - ($lock_data.Created | strptime("%Y-%m-%dT%H:%M:%S.%fZ") | mktime)) > ($threshold | tonumber * 60)) |
{
lock_id: .LockID.S,
age_minutes: ((now - ($lock_data.Created | strptime("%Y-%m-%dT%H:%M:%S.%fZ") | mktime)) / 60 | floor)
}
'
}
cleanup_stale_lock() {
local lock_id="$1"
if [[ "$DRY_RUN" == "true" ]]; then
echo "DRY RUN: Would delete lock $lock_id"
return 0
fi
aws dynamodb delete-item \
--table-name "$LOCK_TABLE" \
--key "{\"LockID\":{\"S\":\"$lock_id\"}}"
}
# Main execution
echo "Starting lock cleanup (threshold: $STALE_THRESHOLD_MINUTES min)"
get_stale_locks | while read -r lock_info; do
lock_id=$(echo "$lock_info" | jq -r '.lock_id')
cleanup_stale_lock "$lock_id"
done
Key Takeaways
For Immediate Issues:
- Always try
terraform force-unlock
with the lock ID first - Use DynamoDB manual deletion as a last resort
- Kill hung processes before attempting unlock
For Prevention:
- Implement CI/CD mutex patterns (resource_group in GitLab, concurrency in GitHub)
- Always use
-lock-timeout
flags (10-15 minutes recommended) - Set up automated monitoring and cleanup for stale locks
- Enable DynamoDB TTL for automatic lock expiration
Critical Safety Rules:
- Never force unlock if another process is actively running
- Always backup state before manual interventions
- Coordinate with team when performing emergency unlocks
- Monitor lock age to catch issues early