How to Use the Terraform Datadog Provider

This guide provides an overview of how to define, version, and manage Datadog with Terraform, including setup and best practices.

Why Use Terraform for Datadog?

The value of managing Datadog as code goes beyond simple automation. It allows for:

  • Version Control & Auditability: By storing configurations in Git, you get a complete, auditable history of every change. This is invaluable for compliance, security reviews, and incident post-mortems.
  • Consistency & Reliability: Automation eliminates the risk of human error from manual configuration. You can ensure that monitoring is applied consistently across all your environments, from development to production.
  • Lifecycle Cohesion: You can bundle an application's observability components with its infrastructure. When you deploy a new service, the same terraform apply can create the required servers, databases, and the corresponding Datadog monitors and dashboards.
  • Drift Detection: terraform plan acts as a powerful safety net. It detects any manual changes made in the Datadog UI, allowing you to either revert them or bring them into code, preventing configuration drift.

Getting Started: Provider Setup

Before you can manage any resources, you need to configure the Datadog provider.

First, declare the provider in your Terraform configuration. Pinning the version is a recommended best practice.

# main.tf
terraform {
  required_providers {
    datadog = {
      source  = "DataDog/datadog"
      version = "~> 3.66"
    }
  }
}

Next, you need to authenticate. The provider requires an API key and an Application key from your Datadog account. The most secure method is to set these as environment variables, which the provider automatically uses:

  • DD_API_KEY
  • DD_APP_KEY

Finally, configure the provider block. If you are not using the default Datadog US1 site, you must specify your site's api_url.

# main.tf
provider "datadog" {
  # The provider will automatically use the environment variables
  # for api_key and app_key.

  # Example for a user on the EU site
  api_url = "https://api.datadoghq.eu/"
}

Practical Examples

With the setup complete, you can start defining Datadog resources.

Creating a Monitor

Monitors are the foundation of your alerting strategy. Here is how you can define a standard metric alert for high CPU usage, complete with warning and critical thresholds, a contextual message, and tags for routing.

resource "datadog_monitor" "high_cpu_usage" {
  name    = "High CPU Usage on {{host.name}}"
  type    = "metric alert"
  query   = "avg(last_5m):avg:system.cpu.user{*} by {host} > 80"
  message = "CPU usage is critically high on host {{host.name}}. Please investigate. @pagerduty-platform-team"

  monitor_thresholds {
    warning  = 70
    critical = 80
  }

  notify_no_data    = true
  no_data_timeframe = 20

  tags = ["service:core-api", "env:prod", "managed_by:terraform"]
}

Defining a Dashboard

You can define entire dashboards in code. A common challenge, however, is dynamically generating widgets. The standard datadog_dashboard resource is not well-suited for this.

The solution is to use the datadog_dashboard_json resource, which accepts a JSON string. This allows you to use Terraform's for expressions to programmatically generate your widget definitions.

# variables.tf
variable "widget_configurations" {
  type = map(object({
    title = string
    query = string
  }))
  default = {
    "cpu_usage" = {
      title = "CPU Usage",
      query = "avg:system.cpu.user{*}"
    },
    "memory_usage" = {
      title = "Memory Usage",
      query = "avg:system.mem.used{*}"
    }
  }
}

# main.tf
resource "datadog_dashboard_json" "programmatic_dashboard" {
  dashboard = jsonencode({
    title       = "Programmatically Generated Dashboard"
    layout_type = "ordered"
    widgets = [
      for key, config in var.widget_configurations : {
        definition = {
          type  = "timeseries"
          title = config.title
          requests = [
            {
              q = config.query
            }
          ]
        }
      }
    ]
  })
}

Defining a Service Level Objective (SLO)

Codifying SLOs is a cornerstone of modern SRE. This example defines a 99.9% availability SLO based on the ratio of successful API requests to total requests.

resource "datadog_service_level_objective" "api_availability" {
  name        = "API Request Availability"
  type        = "metric"
  description = "99.9% of all API requests should be successful (non-5xx)."
  
  query {
    numerator   = "sum:trace.http.request.hits{env:prod,service:core-api,!status_code:5xx}.as_count()"
    denominator = "sum:trace.http.request.hits{env:prod,service:core-api}.as_count()"
  }

  thresholds {
    target    = 99.9
    timeframe = "30d"
    warning   = 99.95
  }
  
  tags = ["service:core-api", "env:prod", "slo:availability"]
}

Best Practices

As your usage grows, managing a single, monolithic Terraform configuration becomes a bottleneck. To manage Datadog at scale, consider these best practices:

  1. Use Modules: Encapsulate common patterns into reusable modules. For example, create a "standard service" module that bundles monitors for latency, error rate, and saturation. This promotes consistency and reduces code duplication.
  2. Separate Environments: Use a directory-based structure to maintain separate state files for each environment (e.g., dev, staging, prod). This provides strong isolation and prevents changes in one environment from impacting another.
  3. Split Your State: The biggest challenge at scale is a slow, monolithic Terraform state file. Split your state into smaller, more manageable units. Common strategies include splitting by team, by service, or by Datadog resource type. This dramatically improves performance and reduces the blast radius of any single change.

Datadog Product to Terraform Resource Mapping

The following table serves as a quick reference for mapping common Datadog products to their corresponding Terraform resource types.

Datadog Product

Primary Terraform Resource

Monitors

datadog_monitor

Dashboards

datadog_dashboard, datadog_dashboard_json

Synthetic Tests

datadog_synthetics_test

Service Level Objectives (SLOs)

datadog_service_level_objective

User Management

datadog_user

Role-Based Access Control

datadog_role

Log Indexes & Pipelines

datadog_logs_index, datadog_logs_custom_pipeline

Key Sources Used

Terraform Datadog Provider Documentation: registry.terraform.io/providers/DataDog/datadog/latest/docs

Datadog API Documentation: docs.datadoghq.com/api/latest/

Terraform Documentation: developer.hashicorp.com/terraform/docs

Terraform Registry (Datadog Modules): registry.terraform.io/search/modules?q=datadog

Terraformer by Google: github.com/GoogleCloudPlatform/terraformer

Datadog Provider GitHub Repository: github.com/DataDog/terraform-provider-datadog

You might also be interested in how to use the Terraform Okta Provider.