Claude Code + OpenTelemetry + Grafana

This post shows how to monitor Claude Code activity — sessions, tokens, cost, tool use, and API events — using a fully local observability stack built on OpenTelemetry, Prometheus, Loki, and Grafana. No external accounts, no SaaS — everything runs on your machine inside a dev container.

Every time the claude CLI runs inside the dev container it automatically exports OpenTelemetry data. Metrics flow into Prometheus, events and logs flow into Loki, and a pre-built Grafana dashboard surfaces all of it — giving you a live, local view of what Claude Code is doing and what it costs.

This is a small learning project I put together while building a larger dashboard application, as a hands-on way to understand how OpenTelemetry pipelines, the OTel Collector, and Grafana provisioning fit together.

👉 My Claude stack on GitHub

Claude Code Grafana Dashboard

🧠 What It Does

This dev container gives you a one-command development environment that bundles a full observability pipeline alongside your app. Its job is to make Claude Code telemetry visible without any manual instrumentation.

  • 📡 Automatic Telemetry: Any claude command running in the container exports OTLP data — no per-command flags.
  • 💰 Cost & Token Visibility: Track total cost, token usage by type (input / output / cacheRead / cacheCreation), and API spend in real time.
  • 🔧 Tool & API Insight: See which tools Claude Code uses most and watch API request latency (p50 / p95).
  • 📋 Live Event Log: Stream Claude Code events from Loki straight into a Grafana panel.

Everything is reproducible: all container images are pinned to explicit tags, and Grafana is provisioned at startup with data sources and a ready-made dashboard.

🏗️ Architecture

The stack connects the app to Grafana through open-source observability tools. The OTel Collector is the hub — it receives OTLP data and fans it out by signal type.

        ┌─────────────────────────────┐
        │           app               │
        │  Next.js + claude CLI       │
        └──────────────┬──────────────┘
                       │ OTLP / gRPC :4317
                       ▼
        ┌─────────────────────────────┐
        │  otel-collector             │
        │  receives OTLP, batches,    │
        │  fans out by signal type    │
        └───────┬─────────────┬───────┘
        metrics │             │ logs / events
   (Prometheus  │             │ (OTLP HTTP)
    exporter    ▼             ▼
    :8889) ┌──────────┐  ┌──────────┐
           │Prometheus│  │   Loki   │
           │  :9090   │  │  :3100   │
           └────┬─────┘  └────┬─────┘
                │             │
                └──────┬──────┘
                       ▼
              ┌─────────────────┐
              │     Grafana     │
              │      :3001      │
              │  Claude Code    │
              │   dashboard     │
              └─────────────────┘

🧩 Services

Service Image Port Purpose
appdevcontainers/javascript-node:223000Dev server + the claude CLI
otel-collectoropentelemetry-collector-contrib:0.107.04317 / 4318OTLP ingest; routes metrics & logs
prometheusprom/prometheus:v2.54.19090Metrics storage; scrapes the collector
lokigrafana/loki:3.1.23100Log / event storage
grafanagrafana/grafana:11.2.23001Dashboards over Prometheus + Loki

🔄 How It Works

  1. Telemetry is enabled by environment. The app service sets CLAUDE_CODE_ENABLE_TELEMETRY=1 plus the OTEL_* exporter variables, so every claude run picks them up automatically.
  2. Claude Code exports OTLP over gRPC to otel-collector:4317. Metrics use cumulative temporality (Prometheus requires cumulative counters) and flush every 10 seconds.
  3. The collector fans out by signal type — metrics go to the Prometheus exporter on :8889; logs and events go to Loki's OTLP endpoint. A batch processor sits in front of both pipelines.
  4. Prometheus scrapes the collector every 5 seconds on the otel-collector job.
  5. Grafana is provisioned at startup with Prometheus + Loki data sources and the pre-built Claude Code Telemetry dashboard.

claude → OTel Collector → Prometheus / Loki → Grafana

📦 docker-compose.yml

The compose file defines the whole stack with pinned image tags. Each service runs in its own container, with a healthcheck and explicit port mappings:

services:
  app:
    # Node.js 22 LTS
    image: mcr.microsoft.com/devcontainers/javascript-node:22
    volumes:
      - ..:/workspace:cached
    command: sleep infinity
    environment:
      OTEL_SERVICE_NAME: bedrock-cost-dashboard

      # Claude Code telemetry — exports metrics + events when `claude` runs
      CLAUDE_CODE_ENABLE_TELEMETRY: "1"
      OTEL_METRICS_EXPORTER: otlp
      OTEL_LOGS_EXPORTER: otlp
      OTEL_EXPORTER_OTLP_PROTOCOL: grpc
      OTEL_EXPORTER_OTLP_ENDPOINT: http://otel-collector:4317
      # Prometheus needs cumulative counters; Claude Code defaults to delta.
      OTEL_EXPORTER_OTLP_METRICS_TEMPORALITY_PREFERENCE: cumulative
      OTEL_METRIC_EXPORT_INTERVAL: "10000"

      # DynamoDB Local — placeholder values only.
      DYNAMODB_ENDPOINT: http://dynamodb-local:8000
      AWS_ACCESS_KEY_ID: local
      AWS_SECRET_ACCESS_KEY: local
      AWS_REGION: us-east-1
    depends_on:
      - otel-collector
    healthcheck:
      test: ["CMD", "node", "-v"]
      interval: 30s
      timeout: 10s
      retries: 5

  otel-collector:
    image: otel/opentelemetry-collector-contrib:0.107.0
    command: ["--config=/etc/otelcol/config.yaml"]
    volumes:
      - ./otel/otel-collector-config.yaml:/etc/otelcol/config.yaml
    ports:
      - "4317:4317"
      - "4318:4318"
    depends_on:
      - prometheus
      - loki
    healthcheck:
      test: ["CMD", "otelcol-contrib", "--version"]
      interval: 30s
      timeout: 10s
      retries: 5

  prometheus:
    image: prom/prometheus:v2.54.1
    volumes:
      - ./prometheus/prometheus.yml:/etc/prometheus/prometheus.yml
    ports:
      - "9090:9090"
    healthcheck:
      test: ["CMD", "wget", "--spider", "http://localhost:9090/-/healthy"]
      interval: 30s
      timeout: 10s
      retries: 5

  loki:
    image: grafana/loki:3.1.2
    ports:
      - "3100:3100"
    healthcheck:
      test: ["CMD", "wget", "--spider", "http://localhost:3100/ready"]
      interval: 30s
      timeout: 10s
      retries: 5

  grafana:
    image: grafana/grafana:11.2.2
    ports:
      - "3001:3000"
    environment:
      GF_SECURITY_ADMIN_USER: admin
      GF_SECURITY_ADMIN_PASSWORD_FILE: /run/secrets/grafana_admin_password
    volumes:
      - ./grafana/provisioning:/etc/grafana/provisioning
    depends_on:
      - prometheus
      - loki
    secrets:
      - grafana_admin_password
    healthcheck:
      test: ["CMD", "wget", "--spider", "http://localhost:3000/api/health"]
      interval: 30s
      timeout: 10s
      retries: 5

secrets:
  grafana_admin_password:
    file: ./grafana_admin_password.txt

📝 otel-collector-config.yaml

The collector config receives OTLP and routes each signal type to the right backend:

  • receivers: accepts OTLP over gRPC (4317) and HTTP (4318).
  • exporters: a prometheus exporter on :8889 for metrics, and otlphttp/loki for logs.
  • service/pipelines: connects receivers to exporters, with a batch processor in between.
receivers:
  otlp:
    protocols:
      grpc:
        endpoint: 0.0.0.0:4317
      http:
        endpoint: 0.0.0.0:4318

exporters:
  prometheus:
    endpoint: "0.0.0.0:8889"
  otlphttp/loki:
    endpoint: http://loki:3100/otlp
    tls:
      insecure: true

processors:
  batch:

service:
  pipelines:
    metrics:
      receivers: [otlp]
      processors: [batch]
      exporters: [prometheus]
    logs:
      receivers: [otlp]
      processors: [batch]
      exporters: [otlphttp/loki]

Tip: the Loki exporter type must be otlphttp (no underscore) — otlp_http is invalid and the collector will fail to start.

📡 prometheus.yml

Prometheus has a single job: scrape the metrics the OTel Collector exposes on port 8889. A tight 5-second interval keeps the dashboard responsive.

global:
  scrape_interval: 5s

scrape_configs:
  - job_name: otel-collector
    static_configs:
      - targets: ["otel-collector:8889"]

📊 Grafana Provisioning

Grafana is configured entirely from files mounted at /etc/grafana/provisioning — no manual clicking. Two pieces wire it up: data sources and a dashboard provider.

Data sources — datasources.yaml

Registers Prometheus (default) and Loki, each reachable by its container name on the compose network:

apiVersion: 1

datasources:
  - name: Prometheus
    uid: prometheus
    type: prometheus
    access: proxy
    url: http://prometheus:9090
    isDefault: true

  - name: Loki
    uid: loki
    type: loki
    access: proxy
    url: http://loki:3100
    isDefault: false

Dashboard provider — dashboards.yaml

Tells Grafana to load any dashboard JSON found in the provisioning folder into a Claude Code folder, re-checking every 30 seconds:

apiVersion: 1

providers:
  - name: default
    folder: Claude Code
    type: file
    disableDeletion: false
    updateIntervalSeconds: 30
    allowUiUpdates: true
    options:
      path: /etc/grafana/provisioning/dashboards

The dashboard — claude-code.json

The pre-built Claude Code Telemetry dashboard ships as JSON alongside the provider. It auto-refreshes every 30 seconds, defaults to a 1-hour window, and includes:

  • Stat panels — total cost, tokens, API requests, and tool uses.
  • Cost over time and a token breakdown by type (input / output / cacheRead / cacheCreation).
  • Top tools used — a horizontal bar chart.
  • API request latency — p50 and p95.
  • Live event log — Claude Code events streamed from Loki.

Panels filter metrics by exported_job="claude-code" — the OTLP job attribute Claude Code sets internally, which the Prometheus exporter surfaces as exported_job (since Prometheus's own scrape job is otel-collector).

⚙️ Setup

  1. Open in the dev container. Open the repo in VS Code and Reopen in Container. This builds the app service and forwards all service ports.
  2. Start the stack:
    docker compose up -d
  3. Confirm Prometheus is scraping the collector — open http://localhost:9090/targets; the otel-collector job should show UP.
  4. Generate telemetry by running any claude command inside the container.
  5. Confirm metrics arrived — in the Prometheus query UI run {__name__=~"claude_code.*"} and look for metrics such as claude_code_cost_usage_USD_total and claude_code_token_usage_tokens_total.
  6. Open Grafana at http://localhost:3001 (user admin). The Claude Code Telemetry dashboard is pre-loaded under the Claude Code folder.

The dashboard auto-refreshes every 30 seconds and defaults to a 1-hour window.

🩺 Troubleshooting

  • otel-collector fails to start: the Loki exporter type must be otlphttp — not otlp_http.
  • Prometheus target is DOWN: the collector exposes metrics on port 8889; confirm the prometheus exporter is in the metrics pipeline.
  • No claude_code.* metrics: ensure CLAUDE_CODE_ENABLE_TELEMETRY=1 is set and claude ran after the collector started. Metrics appear only after the first 10 s export interval.
  • Dashboard shows "No data": the panels filter by exported_job="claude-code". Confirm claude_code.* metrics exist first, then widen the time window.

🌟 Why Observability?

  • Debug agent workflows and tool failures.
  • Track latency, errors, and usage over time.
  • Monitor token consumption and API spend — and catch cost surprises early.
  • Keep everything local: no SaaS, no external accounts, fully reproducible.

🤝 Want to Try It?

If this sounds useful, feel free to:

  • Clone the repo and open it in a dev container.
  • Adapt the collector config and Grafana dashboard to your own stack.
  • Share how you'd extend it — I'm always curious to see new setups!
Check it on GitHub

That's it — a small, fully local observability stack for Claude Code. If you try it, tell me what you'd improve or what panel you'd add next!