This post shows how to monitor Claude Code activity — sessions, tokens, cost, tool use, and API events — using a fully local observability stack built on OpenTelemetry, Prometheus, Loki, and Grafana. No external accounts, no SaaS — everything runs on your machine inside a dev container.
Every time the claude CLI runs inside the dev container it automatically exports OpenTelemetry data. Metrics flow into Prometheus, events and logs flow into Loki, and a pre-built Grafana dashboard surfaces all of it — giving you a live, local view of what Claude Code is doing and what it costs.
This is a small learning project I put together while building a larger dashboard application, as a hands-on way to understand how OpenTelemetry pipelines, the OTel Collector, and Grafana provisioning fit together.
🧠 What It Does
This dev container gives you a one-command development environment that bundles a full observability pipeline alongside your app. Its job is to make Claude Code telemetry visible without any manual instrumentation.
- 📡 Automatic Telemetry: Any
claudecommand running in the container exports OTLP data — no per-command flags. - 💰 Cost & Token Visibility: Track total cost, token usage by type (input / output / cacheRead / cacheCreation), and API spend in real time.
- 🔧 Tool & API Insight: See which tools Claude Code uses most and watch API request latency (p50 / p95).
- 📋 Live Event Log: Stream Claude Code events from Loki straight into a Grafana panel.
Everything is reproducible: all container images are pinned to explicit tags, and Grafana is provisioned at startup with data sources and a ready-made dashboard.
🏗️ Architecture
The stack connects the app to Grafana through open-source observability tools. The OTel Collector is the hub — it receives OTLP data and fans it out by signal type.
┌─────────────────────────────┐
│ app │
│ Next.js + claude CLI │
└──────────────┬──────────────┘
│ OTLP / gRPC :4317
▼
┌─────────────────────────────┐
│ otel-collector │
│ receives OTLP, batches, │
│ fans out by signal type │
└───────┬─────────────┬───────┘
metrics │ │ logs / events
(Prometheus │ │ (OTLP HTTP)
exporter ▼ ▼
:8889) ┌──────────┐ ┌──────────┐
│Prometheus│ │ Loki │
│ :9090 │ │ :3100 │
└────┬─────┘ └────┬─────┘
│ │
└──────┬──────┘
▼
┌─────────────────┐
│ Grafana │
│ :3001 │
│ Claude Code │
│ dashboard │
└─────────────────┘
🧩 Services
| Service | Image | Port | Purpose |
|---|---|---|---|
| app | devcontainers/javascript-node:22 | 3000 | Dev server + the claude CLI |
| otel-collector | opentelemetry-collector-contrib:0.107.0 | 4317 / 4318 | OTLP ingest; routes metrics & logs |
| prometheus | prom/prometheus:v2.54.1 | 9090 | Metrics storage; scrapes the collector |
| loki | grafana/loki:3.1.2 | 3100 | Log / event storage |
| grafana | grafana/grafana:11.2.2 | 3001 | Dashboards over Prometheus + Loki |
🔄 How It Works
- Telemetry is enabled by environment. The
appservice setsCLAUDE_CODE_ENABLE_TELEMETRY=1plus theOTEL_*exporter variables, so everyclauderun picks them up automatically. - Claude Code exports OTLP over gRPC to
otel-collector:4317. Metrics usecumulativetemporality (Prometheus requires cumulative counters) and flush every 10 seconds. - The collector fans out by signal type — metrics go to the Prometheus exporter on
:8889; logs and events go to Loki's OTLP endpoint. Abatchprocessor sits in front of both pipelines. - Prometheus scrapes the collector every 5 seconds on the
otel-collectorjob. - Grafana is provisioned at startup with Prometheus + Loki data sources and the pre-built Claude Code Telemetry dashboard.
claude → OTel Collector → Prometheus / Loki → Grafana
📦 docker-compose.yml
The compose file defines the whole stack with pinned image tags. Each service runs in its own container, with a healthcheck and explicit port mappings:
services:
app:
# Node.js 22 LTS
image: mcr.microsoft.com/devcontainers/javascript-node:22
volumes:
- ..:/workspace:cached
command: sleep infinity
environment:
OTEL_SERVICE_NAME: bedrock-cost-dashboard
# Claude Code telemetry — exports metrics + events when `claude` runs
CLAUDE_CODE_ENABLE_TELEMETRY: "1"
OTEL_METRICS_EXPORTER: otlp
OTEL_LOGS_EXPORTER: otlp
OTEL_EXPORTER_OTLP_PROTOCOL: grpc
OTEL_EXPORTER_OTLP_ENDPOINT: http://otel-collector:4317
# Prometheus needs cumulative counters; Claude Code defaults to delta.
OTEL_EXPORTER_OTLP_METRICS_TEMPORALITY_PREFERENCE: cumulative
OTEL_METRIC_EXPORT_INTERVAL: "10000"
# DynamoDB Local — placeholder values only.
DYNAMODB_ENDPOINT: http://dynamodb-local:8000
AWS_ACCESS_KEY_ID: local
AWS_SECRET_ACCESS_KEY: local
AWS_REGION: us-east-1
depends_on:
- otel-collector
healthcheck:
test: ["CMD", "node", "-v"]
interval: 30s
timeout: 10s
retries: 5
otel-collector:
image: otel/opentelemetry-collector-contrib:0.107.0
command: ["--config=/etc/otelcol/config.yaml"]
volumes:
- ./otel/otel-collector-config.yaml:/etc/otelcol/config.yaml
ports:
- "4317:4317"
- "4318:4318"
depends_on:
- prometheus
- loki
healthcheck:
test: ["CMD", "otelcol-contrib", "--version"]
interval: 30s
timeout: 10s
retries: 5
prometheus:
image: prom/prometheus:v2.54.1
volumes:
- ./prometheus/prometheus.yml:/etc/prometheus/prometheus.yml
ports:
- "9090:9090"
healthcheck:
test: ["CMD", "wget", "--spider", "http://localhost:9090/-/healthy"]
interval: 30s
timeout: 10s
retries: 5
loki:
image: grafana/loki:3.1.2
ports:
- "3100:3100"
healthcheck:
test: ["CMD", "wget", "--spider", "http://localhost:3100/ready"]
interval: 30s
timeout: 10s
retries: 5
grafana:
image: grafana/grafana:11.2.2
ports:
- "3001:3000"
environment:
GF_SECURITY_ADMIN_USER: admin
GF_SECURITY_ADMIN_PASSWORD_FILE: /run/secrets/grafana_admin_password
volumes:
- ./grafana/provisioning:/etc/grafana/provisioning
depends_on:
- prometheus
- loki
secrets:
- grafana_admin_password
healthcheck:
test: ["CMD", "wget", "--spider", "http://localhost:3000/api/health"]
interval: 30s
timeout: 10s
retries: 5
secrets:
grafana_admin_password:
file: ./grafana_admin_password.txt
📝 otel-collector-config.yaml
The collector config receives OTLP and routes each signal type to the right backend:
- receivers: accepts OTLP over gRPC (4317) and HTTP (4318).
- exporters: a
prometheusexporter on:8889for metrics, andotlphttp/lokifor logs. - service/pipelines: connects receivers to exporters, with a
batchprocessor in between.
receivers:
otlp:
protocols:
grpc:
endpoint: 0.0.0.0:4317
http:
endpoint: 0.0.0.0:4318
exporters:
prometheus:
endpoint: "0.0.0.0:8889"
otlphttp/loki:
endpoint: http://loki:3100/otlp
tls:
insecure: true
processors:
batch:
service:
pipelines:
metrics:
receivers: [otlp]
processors: [batch]
exporters: [prometheus]
logs:
receivers: [otlp]
processors: [batch]
exporters: [otlphttp/loki]
Tip: the Loki exporter type must be otlphttp (no underscore) — otlp_http is invalid and the collector will fail to start.
📡 prometheus.yml
Prometheus has a single job: scrape the metrics the OTel Collector exposes on port 8889. A tight 5-second interval keeps the dashboard responsive.
global:
scrape_interval: 5s
scrape_configs:
- job_name: otel-collector
static_configs:
- targets: ["otel-collector:8889"]
📊 Grafana Provisioning
Grafana is configured entirely from files mounted at /etc/grafana/provisioning — no manual clicking. Two pieces wire it up: data sources and a dashboard provider.
Data sources — datasources.yaml
Registers Prometheus (default) and Loki, each reachable by its container name on the compose network:
apiVersion: 1
datasources:
- name: Prometheus
uid: prometheus
type: prometheus
access: proxy
url: http://prometheus:9090
isDefault: true
- name: Loki
uid: loki
type: loki
access: proxy
url: http://loki:3100
isDefault: false
Dashboard provider — dashboards.yaml
Tells Grafana to load any dashboard JSON found in the provisioning folder into a Claude Code folder, re-checking every 30 seconds:
apiVersion: 1
providers:
- name: default
folder: Claude Code
type: file
disableDeletion: false
updateIntervalSeconds: 30
allowUiUpdates: true
options:
path: /etc/grafana/provisioning/dashboards
The dashboard — claude-code.json
The pre-built Claude Code Telemetry dashboard ships as JSON alongside the provider. It auto-refreshes every 30 seconds, defaults to a 1-hour window, and includes:
- Stat panels — total cost, tokens, API requests, and tool uses.
- Cost over time and a token breakdown by type (input / output / cacheRead / cacheCreation).
- Top tools used — a horizontal bar chart.
- API request latency — p50 and p95.
- Live event log — Claude Code events streamed from Loki.
Panels filter metrics by exported_job="claude-code" — the OTLP job attribute Claude Code sets internally, which the Prometheus exporter surfaces as exported_job (since Prometheus's own scrape job is otel-collector).
⚙️ Setup
- Open in the dev container. Open the repo in VS Code and Reopen in Container. This builds the
appservice and forwards all service ports. - Start the stack:
docker compose up -d - Confirm Prometheus is scraping the collector — open http://localhost:9090/targets; the
otel-collectorjob should show UP. - Generate telemetry by running any
claudecommand inside the container. - Confirm metrics arrived — in the Prometheus query UI run
{__name__=~"claude_code.*"}and look for metrics such asclaude_code_cost_usage_USD_totalandclaude_code_token_usage_tokens_total. - Open Grafana at http://localhost:3001 (user
admin). The Claude Code Telemetry dashboard is pre-loaded under the Claude Code folder.
The dashboard auto-refreshes every 30 seconds and defaults to a 1-hour window.
🩺 Troubleshooting
- otel-collector fails to start: the Loki exporter type must be
otlphttp— nototlp_http. - Prometheus target is DOWN: the collector exposes metrics on port 8889; confirm the
prometheusexporter is in the metrics pipeline. - No
claude_code.*metrics: ensureCLAUDE_CODE_ENABLE_TELEMETRY=1is set andclauderan after the collector started. Metrics appear only after the first 10 s export interval. - Dashboard shows "No data": the panels filter by
exported_job="claude-code". Confirmclaude_code.*metrics exist first, then widen the time window.
🌟 Why Observability?
- Debug agent workflows and tool failures.
- Track latency, errors, and usage over time.
- Monitor token consumption and API spend — and catch cost surprises early.
- Keep everything local: no SaaS, no external accounts, fully reproducible.
🤝 Want to Try It?
If this sounds useful, feel free to:
- Clone the repo and open it in a dev container.
- Adapt the collector config and Grafana dashboard to your own stack.
- Share how you'd extend it — I'm always curious to see new setups!
