Skip to content

Observability

Alquimia produces three observability signals out of the box: metrics (from alquimia-core), traces (from alquimia-runtime), and logs (from alquimia-runtime). All three are correlated by a shared set of dimensions called CommonAttributes.

Observability is split across the two layers of the platform:

┌─────────────────────────────────────────────────────────────┐
│ alquimia-runtime (FastAPI) │
│ │
│ TRACES — one span per HTTP request + CloudEvent lifecycle │
│ LOGS — every loguru record bridged to OTEL logs │
│ X-Request-ID — per-HTTP-request correlation ID │
└──────────────────────┬──────────────────────────────────────┘
│ CloudEvents
┌─────────────────────────────────────────────────────────────┐
│ alquimia-core (SDK) │
│ │
│ METRICS — 18 instruments across tokens, latency, tools, │
│ shields, responses, empathy, agent lifecycle │
└─────────────────────────────────────────────────────────────┘

Why the split? The runtime owns the HTTP boundary and the CloudEvent lifecycle — it is the right place for request-scoped traces and structured logs. The core SDK owns the agent execution loop — it is the right place for business-level metrics (token consumption, tool invocations, agent completions).

CommonAttributes — the correlation backbone

Section titled “CommonAttributes — the correlation backbone”

Every observability signal carries the same five dimensions. These are extracted from HTTP request headers by the runtime’s telemetry_metadata_middleware and propagated through the entire execution stack.

FieldHTTP headerDescription
assistant_idassistant-idThe agent being invoked
agentspace_idagentspace-idThe agentspace containing the agent
user_iduser-idThe end user
session_idsession-idThe conversation session
channel_idchannel-idThe channel (WhatsApp, Slack, etc.) — optional
task_idtask-idThe inference task — set per-request

These dimensions appear on:

  • Every OTEL metric emitted by AlquimiaObserver
  • Every OTEL log record emitted by the loguru→OTEL bridge
  • Every OTEL span attribute set by telemetry_metadata_middleware

This means you can filter any signal — metrics, traces, or logs — by assistant_id or session_id in your observability backend without any additional instrumentation.

Every HTTP request to the runtime carries an X-Request-ID header. If the client does not provide one, the runtime generates a UUID v4. The ID is echoed back in the response header.

Request: X-Request-ID: my-correlation-id-123 (or omit — runtime generates one)
Response: X-Request-ID: my-correlation-id-123

X-Request-ID is per-HTTP-request — it changes on every call. task_id is per-inference-run — it stays the same across the SSE stream and all CloudEvent hops for a single agent execution.

Use X-Request-ID to correlate logs for a single HTTP call. Use task_id to correlate everything across an entire inference run.

Metrics are emitted by alquimia-core via OpenTelemetry. They are disabled by default — call setup_observability() once at application startup to enable them.

from alquimia.core.observability import setup_observability
# Call once at startup — idempotent, safe to call multiple times
setup_observability()

The metrics endpoint is configured via OTEL_COLLECTOR_ENDPOINT. If this variable is not set, setup_observability() is a no-op and no metrics are exported.

CategoryWhat it tracks
LLM tokensCompletion, prompt, and total token counts per model call
LLM latencyCompletion time, prompt processing time, queue wait time, end-to-end time
Tool invocationsCount of tool calls by tool name and type
Tool errorsCount of failed tool calls by tool name
Shield invocationsCount of guard/classifier model calls
Shield errorsCount of failed shield calls
Response invocationsCount of ResponseInference calls
Response errorsCount of failed response calls
Empathy rule matchesCount of empathy rule triggers by rule ID
Agent lifecycleAgent execution starts and completions

See Core SDK Observability for the full metric names, types, and dimension reference.

Traces are emitted by alquimia-runtime via OpenTelemetry. They are disabled by default — set OTEL_COLLECTOR_ENDPOINT_TRACES to enable them.

Two types of spans are created automatically:

HTTP spans — created by FastAPIInstrumentor for every incoming HTTP request. Covers the full request/response lifecycle including middleware.

CloudEvent spans — created by cloudevent_interceptor for internal CloudEvent requests (the /controller/process, /models/response, /tools/process endpoints). The finalize_span middleware ensures these spans end after the full handler completes, not just after the HTTP response is sent.

Trace context is propagated via the W3C traceparent header. This means a single inference run — which involves Kafka event hops between the master and workers — produces a connected trace tree in your backend.

All loguru log records are bridged to OTEL logs when OTEL_COLLECTOR_ENDPOINT_LOGS is set. Each log record carries:

  • Span contexttrace_id and span_id from the active OTEL span, enabling log-to-trace correlation
  • CommonAttributes — all five dimensions (assistant_id, session_id, etc.) from the current request context
  • Source locationcode.filepath, code.lineno, code.function
  • Exception infoexception.type and exception.message when the log record includes an exception

This means every log line emitted during an inference run is queryable by assistant_id, session_id, or task_id in your log backend — without any manual log enrichment.

Loguru levelOTEL severity numberOTEL severity text
TRACE1TRACE
DEBUG5DEBUG
INFO9INFO
SUCCESS9INFO
WARNING13WARN
ERROR17ERROR
CRITICAL21FATAL

For a single inference run, here is how the signals connect:

HTTP request arrives
├── X-Request-ID generated/propagated
├── CommonAttributes extracted from headers → stored in ContextVar
├── OTEL HTTP span started (FastAPIInstrumentor)
CloudEvent published to Kafka
├── CloudEvent span started (cloudevent_interceptor)
alquimia-core evaluate() loop
├── METRICS: assistant_executions_started_total +1
├── METRICS: tool_invocations_total +1 (per tool call)
├── METRICS: llm.completion_tokens +N (per LLM call)
AssistantInferenceResponse produced
├── METRICS: assistant_executions_ended_total +1
├── CloudEvent span ended (finalize_span middleware)
├── LOGS: all loguru records flushed to OTEL with trace_id + CommonAttributes
└── HTTP span ended

All three signals share the same assistant_id, session_id, and task_id. In a backend like Grafana, you can:

  1. Find a slow inference in the metrics dashboard (high alquimia.llm.total_time)
  2. Jump to the trace for that task_id to see which CloudEvent step was slow
  3. Jump to the logs for that trace_id to see the exact error or decision point
.env
# Core SDK metrics
OTEL_COLLECTOR_ENDPOINT=http://otel-collector:4318/v1/metrics
app startup
from alquimia.core.observability import setup_observability
setup_observability()
.env
# Runtime: traces and logs
OTEL_COLLECTOR_ENDPOINT_TRACES=http://otel-collector:4318/v1/traces
OTEL_COLLECTOR_ENDPOINT_LOGS=http://otel-collector:4318/v1/logs
OTEL_ALQUIMIA_SERVICE_NAME=alquimia-runtime
# Core SDK: metrics
OTEL_COLLECTOR_ENDPOINT=http://otel-collector:4318/v1/metrics
OTEL_ALQUIMIA_SERVICE_NAME=alquimia-runtime
app startup (when using alquimia-core directly)
from alquimia.core.observability import setup_observability
setup_observability()

When running via alquimia-runtime, setup_telemetry() is called automatically in the FastAPI lifespan. You only need to call setup_observability() when using alquimia-core as a standalone SDK.

If certain dimensions (e.g., user_id, session_id) must not appear in your metrics backend for privacy or compliance reasons, use OTEL_EXCLUDED_ATTRIBUTES:

Terminal window
# Strip user_id and session_id from all metric dimensions
OTEL_EXCLUDED_ATTRIBUTES=user_id,session_id

This applies only to metrics. Traces and logs are not filtered by this variable — apply OTEL collector-level processors for those.