Observability

Alquimia produces three observability signals out of the box: metrics (from alquimia-core), traces (from alquimia-runtime), and logs (from alquimia-runtime). All three are correlated by a shared set of dimensions called CommonAttributes.

The two-layer model

Observability is split across the two layers of the platform:

┌─────────────────────────────────────────────────────────────┐
│                 alquimia-runtime (FastAPI)                  │
│                                                             │
│  TRACES — one span per HTTP request + CloudEvent lifecycle  │
│  LOGS   — every loguru record bridged to OTEL logs          │
│  X-Request-ID — per-HTTP-request correlation ID             │
└──────────────────────┬──────────────────────────────────────┘
                       │ CloudEvents
                       ▼
┌─────────────────────────────────────────────────────────────┐
│                  alquimia-core (SDK)                        │
│                                                             │
│  METRICS — 18 instruments across tokens, latency, tools,   │
│            shields, responses, empathy, agent lifecycle     │
└─────────────────────────────────────────────────────────────┘

Why the split? The runtime owns the HTTP boundary and the CloudEvent lifecycle — it is the right place for request-scoped traces and structured logs. The core SDK owns the agent execution loop — it is the right place for business-level metrics (token consumption, tool invocations, agent completions).

CommonAttributes — the correlation backbone

Every observability signal carries the same five dimensions. These are extracted from HTTP request headers by the runtime’s telemetry_metadata_middleware and propagated through the entire execution stack.

Field	HTTP header	Description
`assistant_id`	`assistant-id`	The agent being invoked
`agentspace_id`	`agentspace-id`	The agentspace containing the agent
`user_id`	`user-id`	The end user
`session_id`	`session-id`	The conversation session
`channel_id`	`channel-id`	The channel (WhatsApp, Slack, etc.) — optional
`task_id`	`task-id`	The inference task — set per-request

These dimensions appear on:

Every OTEL metric emitted by AlquimiaObserver
Every OTEL log record emitted by the loguru→OTEL bridge
Every OTEL span attribute set by telemetry_metadata_middleware

This means you can filter any signal — metrics, traces, or logs — by assistant_id or session_id in your observability backend without any additional instrumentation.

X-Request-ID

Every HTTP request to the runtime carries an X-Request-ID header. If the client does not provide one, the runtime generates a UUID v4. The ID is echoed back in the response header.

Request:   X-Request-ID: my-correlation-id-123   (or omit — runtime generates one)
Response:  X-Request-ID: my-correlation-id-123

X-Request-ID is per-HTTP-request — it changes on every call. task_id is per-inference-run — it stays the same across the SSE stream and all CloudEvent hops for a single agent execution.

Use X-Request-ID to correlate logs for a single HTTP call. Use task_id to correlate everything across an entire inference run.

Metrics (alquimia-core)

Metrics are emitted by alquimia-core via OpenTelemetry. They are disabled by default — call setup_observability() once at application startup to enable them.

from alquimia.core.observability import setup_observability

# Call once at startup — idempotent, safe to call multiple times
setup_observability()

The metrics endpoint is configured via OTEL_COLLECTOR_ENDPOINT. If this variable is not set, setup_observability() is a no-op and no metrics are exported.

What is measured

Category	What it tracks
LLM tokens	Completion, prompt, and total token counts per model call
LLM latency	Completion time, prompt processing time, queue wait time, end-to-end time
Tool invocations	Count of tool calls by tool name and type
Tool errors	Count of failed tool calls by tool name
Shield invocations	Count of guard/classifier model calls
Shield errors	Count of failed shield calls
Response invocations	Count of `ResponseInference` calls
Response errors	Count of failed response calls
Empathy rule matches	Count of empathy rule triggers by rule ID
Agent lifecycle	Agent execution starts and completions

See Core SDK Observability for the full metric names, types, and dimension reference.

Traces (alquimia-runtime)

Traces are emitted by alquimia-runtime via OpenTelemetry. They are disabled by default — set OTEL_COLLECTOR_ENDPOINT_TRACES to enable them.

Two types of spans are created automatically:

HTTP spans — created by FastAPIInstrumentor for every incoming HTTP request. Covers the full request/response lifecycle including middleware.

CloudEvent spans — created by cloudevent_interceptor for internal CloudEvent requests (the /controller/process, /models/response, /tools/process endpoints). The finalize_span middleware ensures these spans end after the full handler completes, not just after the HTTP response is sent.

Trace context is propagated via the W3C traceparent header. This means a single inference run — which involves Kafka event hops between the master and workers — produces a connected trace tree in your backend.

Logs (alquimia-runtime)

All loguru log records are bridged to OTEL logs when OTEL_COLLECTOR_ENDPOINT_LOGS is set. Each log record carries:

Span context — trace_id and span_id from the active OTEL span, enabling log-to-trace correlation
CommonAttributes — all five dimensions (assistant_id, session_id, etc.) from the current request context
Source location — code.filepath, code.lineno, code.function
Exception info — exception.type and exception.message when the log record includes an exception

This means every log line emitted during an inference run is queryable by assistant_id, session_id, or task_id in your log backend — without any manual log enrichment.

Log levels

Loguru level	OTEL severity number	OTEL severity text
`TRACE`	1	TRACE
`DEBUG`	5	DEBUG
`INFO`	9	INFO
`SUCCESS`	9	INFO
`WARNING`	13	WARN
`ERROR`	17	ERROR
`CRITICAL`	21	FATAL

How the three signals relate

For a single inference run, here is how the signals connect:

HTTP request arrives
  │
  ├── X-Request-ID generated/propagated
  ├── CommonAttributes extracted from headers → stored in ContextVar
  ├── OTEL HTTP span started (FastAPIInstrumentor)
  │
  ▼
CloudEvent published to Kafka
  │
  ├── CloudEvent span started (cloudevent_interceptor)
  │
  ▼
alquimia-core evaluate() loop
  │
  ├── METRICS: assistant_executions_started_total +1
  ├── METRICS: tool_invocations_total +1 (per tool call)
  ├── METRICS: llm.completion_tokens +N (per LLM call)
  │
  ▼
AssistantInferenceResponse produced
  │
  ├── METRICS: assistant_executions_ended_total +1
  ├── CloudEvent span ended (finalize_span middleware)
  ├── LOGS: all loguru records flushed to OTEL with trace_id + CommonAttributes
  └── HTTP span ended

All three signals share the same assistant_id, session_id, and task_id. In a backend like Grafana, you can:

Find a slow inference in the metrics dashboard (high alquimia.llm.total_time)
Jump to the trace for that task_id to see which CloudEvent step was slow
Jump to the logs for that trace_id to see the exact error or decision point

Enabling observability

Minimal setup (metrics only)

# Core SDK metrics
OTEL_COLLECTOR_ENDPOINT=http://otel-collector:4318/v1/metrics

from alquimia.core.observability import setup_observability
setup_observability()

Full setup (metrics + traces + logs)

# Runtime: traces and logs
OTEL_COLLECTOR_ENDPOINT_TRACES=http://otel-collector:4318/v1/traces
OTEL_COLLECTOR_ENDPOINT_LOGS=http://otel-collector:4318/v1/logs
OTEL_ALQUIMIA_SERVICE_NAME=alquimia-runtime

# Core SDK: metrics
OTEL_COLLECTOR_ENDPOINT=http://otel-collector:4318/v1/metrics
OTEL_ALQUIMIA_SERVICE_NAME=alquimia-runtime

from alquimia.core.observability import setup_observability
setup_observability()

When running via alquimia-runtime, setup_telemetry() is called automatically in the FastAPI lifespan. You only need to call setup_observability() when using alquimia-core as a standalone SDK.

Privacy: stripping dimensions

If certain dimensions (e.g., user_id, session_id) must not appear in your metrics backend for privacy or compliance reasons, use OTEL_EXCLUDED_ATTRIBUTES:

# Strip user_id and session_id from all metric dimensions
OTEL_EXCLUDED_ATTRIBUTES=user_id,session_id

This applies only to metrics. Traces and logs are not filtered by this variable — apply OTEL collector-level processors for those.

Core SDK Observability — setup_observability(), all 18 metrics, dimension reference
Runtime Observability — setup_telemetry(), traces, loguru bridge, middleware
Configuration Reference — all OTEL environment variables
Inference Endpoints — X-Request-ID and CommonAttributes headers

Source

Alquimia-ai/alquimia-core — src/alquimia/core/observability.py, src/alquimia/core/base.py
Alquimia-ai/alquimia-runtime — runtime/src/telemetry.py, runtime/src/main.py