Observability
Alquimia produces three observability signals out of the box: metrics (from alquimia-core), traces (from alquimia-runtime), and logs (from alquimia-runtime). All three are correlated by a shared set of dimensions called CommonAttributes.
The two-layer model
Section titled “The two-layer model”Observability is split across the two layers of the platform:
┌─────────────────────────────────────────────────────────────┐│ alquimia-runtime (FastAPI) ││ ││ TRACES — one span per HTTP request + CloudEvent lifecycle ││ LOGS — every loguru record bridged to OTEL logs ││ X-Request-ID — per-HTTP-request correlation ID │└──────────────────────┬──────────────────────────────────────┘ │ CloudEvents ▼┌─────────────────────────────────────────────────────────────┐│ alquimia-core (SDK) ││ ││ METRICS — 18 instruments across tokens, latency, tools, ││ shields, responses, empathy, agent lifecycle │└─────────────────────────────────────────────────────────────┘Why the split? The runtime owns the HTTP boundary and the CloudEvent lifecycle — it is the right place for request-scoped traces and structured logs. The core SDK owns the agent execution loop — it is the right place for business-level metrics (token consumption, tool invocations, agent completions).
CommonAttributes — the correlation backbone
Section titled “CommonAttributes — the correlation backbone”Every observability signal carries the same five dimensions. These are extracted from HTTP request headers by the runtime’s telemetry_metadata_middleware and propagated through the entire execution stack.
| Field | HTTP header | Description |
|---|---|---|
assistant_id | assistant-id | The agent being invoked |
agentspace_id | agentspace-id | The agentspace containing the agent |
user_id | user-id | The end user |
session_id | session-id | The conversation session |
channel_id | channel-id | The channel (WhatsApp, Slack, etc.) — optional |
task_id | task-id | The inference task — set per-request |
These dimensions appear on:
- Every OTEL metric emitted by
AlquimiaObserver - Every OTEL log record emitted by the loguru→OTEL bridge
- Every OTEL span attribute set by
telemetry_metadata_middleware
This means you can filter any signal — metrics, traces, or logs — by assistant_id or session_id in your observability backend without any additional instrumentation.
X-Request-ID
Section titled “X-Request-ID”Every HTTP request to the runtime carries an X-Request-ID header. If the client does not provide one, the runtime generates a UUID v4. The ID is echoed back in the response header.
Request: X-Request-ID: my-correlation-id-123 (or omit — runtime generates one)Response: X-Request-ID: my-correlation-id-123X-Request-ID is per-HTTP-request — it changes on every call. task_id is per-inference-run — it stays the same across the SSE stream and all CloudEvent hops for a single agent execution.
Use X-Request-ID to correlate logs for a single HTTP call. Use task_id to correlate everything across an entire inference run.
Metrics (alquimia-core)
Section titled “Metrics (alquimia-core)”Metrics are emitted by alquimia-core via OpenTelemetry. They are disabled by default — call setup_observability() once at application startup to enable them.
from alquimia.core.observability import setup_observability
# Call once at startup — idempotent, safe to call multiple timessetup_observability()The metrics endpoint is configured via OTEL_COLLECTOR_ENDPOINT. If this variable is not set, setup_observability() is a no-op and no metrics are exported.
What is measured
Section titled “What is measured”| Category | What it tracks |
|---|---|
| LLM tokens | Completion, prompt, and total token counts per model call |
| LLM latency | Completion time, prompt processing time, queue wait time, end-to-end time |
| Tool invocations | Count of tool calls by tool name and type |
| Tool errors | Count of failed tool calls by tool name |
| Shield invocations | Count of guard/classifier model calls |
| Shield errors | Count of failed shield calls |
| Response invocations | Count of ResponseInference calls |
| Response errors | Count of failed response calls |
| Empathy rule matches | Count of empathy rule triggers by rule ID |
| Agent lifecycle | Agent execution starts and completions |
See Core SDK Observability for the full metric names, types, and dimension reference.
Traces (alquimia-runtime)
Section titled “Traces (alquimia-runtime)”Traces are emitted by alquimia-runtime via OpenTelemetry. They are disabled by default — set OTEL_COLLECTOR_ENDPOINT_TRACES to enable them.
Two types of spans are created automatically:
HTTP spans — created by FastAPIInstrumentor for every incoming HTTP request. Covers the full request/response lifecycle including middleware.
CloudEvent spans — created by cloudevent_interceptor for internal CloudEvent requests (the /controller/process, /models/response, /tools/process endpoints). The finalize_span middleware ensures these spans end after the full handler completes, not just after the HTTP response is sent.
Trace context is propagated via the W3C traceparent header. This means a single inference run — which involves Kafka event hops between the master and workers — produces a connected trace tree in your backend.
Logs (alquimia-runtime)
Section titled “Logs (alquimia-runtime)”All loguru log records are bridged to OTEL logs when OTEL_COLLECTOR_ENDPOINT_LOGS is set. Each log record carries:
- Span context —
trace_idandspan_idfrom the active OTEL span, enabling log-to-trace correlation - CommonAttributes — all five dimensions (
assistant_id,session_id, etc.) from the current request context - Source location —
code.filepath,code.lineno,code.function - Exception info —
exception.typeandexception.messagewhen the log record includes an exception
This means every log line emitted during an inference run is queryable by assistant_id, session_id, or task_id in your log backend — without any manual log enrichment.
Log levels
Section titled “Log levels”| Loguru level | OTEL severity number | OTEL severity text |
|---|---|---|
TRACE | 1 | TRACE |
DEBUG | 5 | DEBUG |
INFO | 9 | INFO |
SUCCESS | 9 | INFO |
WARNING | 13 | WARN |
ERROR | 17 | ERROR |
CRITICAL | 21 | FATAL |
How the three signals relate
Section titled “How the three signals relate”For a single inference run, here is how the signals connect:
HTTP request arrives │ ├── X-Request-ID generated/propagated ├── CommonAttributes extracted from headers → stored in ContextVar ├── OTEL HTTP span started (FastAPIInstrumentor) │ ▼CloudEvent published to Kafka │ ├── CloudEvent span started (cloudevent_interceptor) │ ▼alquimia-core evaluate() loop │ ├── METRICS: assistant_executions_started_total +1 ├── METRICS: tool_invocations_total +1 (per tool call) ├── METRICS: llm.completion_tokens +N (per LLM call) │ ▼AssistantInferenceResponse produced │ ├── METRICS: assistant_executions_ended_total +1 ├── CloudEvent span ended (finalize_span middleware) ├── LOGS: all loguru records flushed to OTEL with trace_id + CommonAttributes └── HTTP span endedAll three signals share the same assistant_id, session_id, and task_id. In a backend like Grafana, you can:
- Find a slow inference in the metrics dashboard (high
alquimia.llm.total_time) - Jump to the trace for that
task_idto see which CloudEvent step was slow - Jump to the logs for that
trace_idto see the exact error or decision point
Enabling observability
Section titled “Enabling observability”Minimal setup (metrics only)
Section titled “Minimal setup (metrics only)”# Core SDK metricsOTEL_COLLECTOR_ENDPOINT=http://otel-collector:4318/v1/metricsfrom alquimia.core.observability import setup_observabilitysetup_observability()Full setup (metrics + traces + logs)
Section titled “Full setup (metrics + traces + logs)”# Runtime: traces and logsOTEL_COLLECTOR_ENDPOINT_TRACES=http://otel-collector:4318/v1/tracesOTEL_COLLECTOR_ENDPOINT_LOGS=http://otel-collector:4318/v1/logsOTEL_ALQUIMIA_SERVICE_NAME=alquimia-runtime
# Core SDK: metricsOTEL_COLLECTOR_ENDPOINT=http://otel-collector:4318/v1/metricsOTEL_ALQUIMIA_SERVICE_NAME=alquimia-runtimefrom alquimia.core.observability import setup_observabilitysetup_observability()When running via alquimia-runtime, setup_telemetry() is called automatically in the FastAPI lifespan. You only need to call setup_observability() when using alquimia-core as a standalone SDK.
Privacy: stripping dimensions
Section titled “Privacy: stripping dimensions”If certain dimensions (e.g., user_id, session_id) must not appear in your metrics backend for privacy or compliance reasons, use OTEL_EXCLUDED_ATTRIBUTES:
# Strip user_id and session_id from all metric dimensionsOTEL_EXCLUDED_ATTRIBUTES=user_id,session_idThis applies only to metrics. Traces and logs are not filtered by this variable — apply OTEL collector-level processors for those.
Related pages
Section titled “Related pages”- Core SDK Observability —
setup_observability(), all 18 metrics, dimension reference - Runtime Observability —
setup_telemetry(), traces, loguru bridge, middleware - Configuration Reference — all OTEL environment variables
- Inference Endpoints —
X-Request-IDandCommonAttributesheaders
Source
Section titled “Source”Alquimia-ai/alquimia-core—src/alquimia/core/observability.py,src/alquimia/core/base.pyAlquimia-ai/alquimia-runtime—runtime/src/telemetry.py,runtime/src/main.py