Skip to content

Runtime Observability

alquimia-runtime emits OpenTelemetry traces and logs. Both are disabled by default and activated by setting environment variables before startup.

For the metrics signal (emitted by alquimia-core), see Core SDK Observability. For the full picture of how all three signals relate, see Observability.


runtime/src/telemetry.py
def setup_telemetry() -> None

Initialises the OTEL TracerProvider and LoggerProvider. Called automatically in the FastAPI lifespan startup — you do not call this manually when running alquimia-runtime.

  • Idempotent — subsequent calls are no-ops if already initialised.
  • Deferred — called in the lifespan, not at import time. This prevents OTEL from making network connections during test collection (QUAL-004).
  • Conditional — if OTEL_COLLECTOR_ENDPOINT_TRACES is not set, the TracerProvider is still created (so spans are valid objects) but no BatchSpanProcessor is attached — spans are discarded. If OTEL_COLLECTOR_ENDPOINT_LOGS is also not set, no LoggerProvider is created and the loguru→OTEL bridge is not installed.
VariableDefaultDescription
OTEL_COLLECTOR_ENDPOINT_TRACESOTLP HTTP endpoint for traces. Traces are created but discarded if unset.
OTEL_COLLECTOR_ENDPOINT_LOGS(same as traces)OTLP HTTP endpoint for logs. Defaults to OTEL_COLLECTOR_ENDPOINT_TRACES.
OTEL_ALQUIMIA_SERVICE_NAMEalquimiaservice.name resource attribute on all spans and log records

Traces and logs are sent to separate OTEL endpoints. This is intentional — it allows you to route them to different collector pipelines:

OTEL_COLLECTOR_ENDPOINT_TRACES → Jaeger / Tempo (trace backend)
OTEL_COLLECTOR_ENDPOINT_LOGS → Loki / OpenSearch (log backend)

If you use a single OpenTelemetry Collector that handles both, set both variables to the same URL:

Terminal window
OTEL_COLLECTOR_ENDPOINT_TRACES=http://otel-collector:4318/v1/traces
OTEL_COLLECTOR_ENDPOINT_LOGS=http://otel-collector:4318/v1/logs

FastAPIInstrumentor.instrument_app(app) is called once after the FastAPI app is created. It automatically creates an OTEL span for every HTTP request:

  • Span name: {HTTP_METHOD} {route_template} (e.g., POST /event/infer/{assistant_id})
  • Span attributes: HTTP method, URL, status code, route template
  • Span lifecycle: starts when the request enters the ASGI stack, ends when the response is sent

No configuration is required. All routers — public and internal — are instrumented automatically.

CloudEvent spans (finalize_span middleware)

Section titled “CloudEvent spans (finalize_span middleware)”

Internal CloudEvent requests (to /controller/process, /models/response, /tools/process, etc.) get an additional span that covers the full handler execution, not just the HTTP response time.

The finalize_span middleware is registered as the outermost middleware in the stack (Starlette reverses registration order, so the last-registered middleware runs first). This is a correctness constraint — do not reorder the middleware without understanding the span lifecycle.

# Execution order (outermost → innermost):
# finalize_span → request_id_middleware → telemetry_metadata_middleware → CORSMiddleware → handler

The middleware calls cloudevent_interceptor(request) to start a span for CloudEvent requests, then calls span.end() after the full response is returned. For non-CloudEvent requests, it is a pass-through.

Trace context is propagated via the W3C traceparent header using TraceContextTextMapPropagator. When the master publishes a CloudEvent to Kafka, it injects the current trace context into the CloudEvent headers. The worker propagates it through the dispatch chain.

This means a single inference run — which involves multiple CloudEvent hops — produces a connected trace tree in your backend, with parent-child relationships between spans.


When OTEL_COLLECTOR_ENDPOINT_LOGS is set, setup_loguru_otel_sink() installs a custom loguru sink that forwards every log record to the OTEL LoggerProvider.

Each OTEL log record carries:

Source location:

AttributeValue
log.source.nameLogger name (Python module)
code.filepathAbsolute path to the source file
code.linenoLine number
code.functionFunction name

Span context (when a span is active):

AttributeValue
trace_idActive span’s trace ID
span_idActive span’s span ID
trace_flagsActive span’s trace flags

This enables log-to-trace correlation in backends like Grafana — click a log line to jump to the trace, or click a span to see its logs.

CommonAttributes (when a request is in flight):

All five CommonAttributes fields (assistant_id, agentspace_id, user_id, session_id, channel_id) are attached to every log record emitted during a request. This enables filtering logs by assistant_id or session_id without any manual enrichment.

Exception info (when the log record includes an exception):

AttributeValue
exception.typeException class name
exception.messageException message string
Loguru levelOTEL severity numberOTEL severity text
TRACE1TRACE
DEBUG5DEBUG
INFO9INFO
SUCCESS9INFO
WARNING13WARN
ERROR17ERROR
CRITICAL21FATAL

Regardless of whether OTEL logs are enabled, the runtime always writes structured logs to stdout:

2025-07-15 12:00:00 | INFO | routers.event:infer:42 - Inference request received

Format: {time} | {level} | {name}:{function}:{line} - {message}

Log level is controlled by LOGGING_LEVEL (default 20 = INFO). Set DEBUG=true to switch to DEBUG level.


This middleware extracts CommonAttributes from incoming request headers and stores them in a ContextVar for the duration of the request:

async def telemetry_metadata_middleware(request: Request, call_next):
clear_runtime_telemetry_metadata()
context_metadata = CommonAttributes.model_validate(request.headers)
if context_metadata:
set_runtime_telemetry_metadata(context_metadata)
try:
return await call_next(request)
finally:
clear_runtime_telemetry_metadata()

The stored CommonAttributes are used by:

  • The loguru→OTEL bridge — attached to every log record
  • alquimia-core’s observe() — used as metric dimensions

The ContextVar is cleared after every request, so there is no cross-request leakage.

from telemetry import get_runtime_telemetry_metadata
context_metadata = get_runtime_telemetry_metadata()
if context_metadata:
print(context_metadata.assistant_id)

Every request gets an X-Request-ID header (OBS-001):

@app.middleware("http")
async def request_id_middleware(request: Request, call_next):
request_id = request.headers.get("X-Request-ID") or str(uuid.uuid4())
request.state.request_id = request_id
response = await call_next(request)
response.headers["X-Request-ID"] = request_id
return response
  • If the client provides X-Request-ID, it is propagated unchanged.
  • If not, a UUID v4 is generated.
  • The ID is always echoed back in the response header.
  • It is attached to request.state.request_id for use in handlers.

X-Request-ID is per-HTTP-request. Use it to correlate logs for a single HTTP call. Use task_id to correlate everything across an entire inference run (multiple HTTP calls + CloudEvent hops).


The runtime emits non-fatal warnings at startup for common misconfigurations (OBS-002, OBS-003). These appear in logs as [STARTUP CONFIG] entries and do not prevent the service from starting.

CodeConditionLog message
OBS-002VAULT_TOKEN is emptyOBS-002: VAULT_TOKEN is not set. Registry secret resolution via Vault will fail.
OBS-002API_TOKEN is a default/test valueOBS-002: API_TOKEN appears to be a default/test value.
OBS-003Any S3 blob setting is emptyOBS-003: Blob S3 storage is not fully configured. Missing/empty: [fields].

These are intentionally non-fatal. The goal is to surface misconfiguration early in logs rather than at first request time.


Starlette reverses middleware registration order. The actual execution order (outermost → innermost) is:

1. finalize_span — starts/ends CloudEvent spans
2. request_id_middleware — generates/propagates X-Request-ID
3. telemetry_metadata_middleware — extracts CommonAttributes into ContextVar
4. CORSMiddleware — handles CORS preflight
5. Route handler — the actual endpoint logic

This order is a correctness constraint. finalize_span must be outermost so it covers the full request lifecycle including all other middleware. telemetry_metadata_middleware must run before the handler so CommonAttributes are available when the handler calls alquimia-core.