Core SDK Observability
alquimia-core emits OpenTelemetry metrics for every significant event in the agent execution loop. Metrics are disabled by default — call setup_observability() once at application startup to enable them.
setup_observability()
Section titled “setup_observability()”from alquimia.core.observability import setup_observability
setup_observability( endpoint: str | None = None, interval_millis: int | None = None, meter_name: str | None = None, service_name: str | None = None,) -> NoneInitialises the global OTEL MeterProvider. Call this once at application startup, not at import time.
- Idempotent — subsequent calls are no-ops if the meter is already configured.
- No-op if disabled — if
endpointisNoneandOTEL_COLLECTOR_ENDPOINTis not set, the function returns immediately without configuring anything. - No import-time side effects — importing
alquimia.core.observabilitydoes not connect to any collector or mutate global state.
Parameters
Section titled “Parameters”| Parameter | Type | Default | Description |
|---|---|---|---|
endpoint | str | None | OTEL_COLLECTOR_ENDPOINT env var | OTLP HTTP endpoint for metrics (e.g., http://otel-collector:4318/v1/metrics) |
interval_millis | int | None | OTEL_EXPORTER_INTERVAL_MILLIS env var (default 5000) | Export interval in milliseconds |
meter_name | str | None | OTEL_ALQUIMIA_METER_NAME env var (default alquimia-metrics) | Meter name in the collector |
service_name | str | None | OTEL_ALQUIMIA_SERVICE_NAME env var (default alquimia) | service.name resource attribute |
Environment variables
Section titled “Environment variables”All parameters default to environment variables read at module import time:
| Variable | Default | Description |
|---|---|---|
OTEL_COLLECTOR_ENDPOINT | — | Metrics endpoint. Metrics are disabled if this is not set. |
OTEL_EXPORTER_INTERVAL_MILLIS | 5000 | Export interval in milliseconds |
OTEL_ALQUIMIA_METER_NAME | alquimia-metrics | Meter name |
OTEL_ALQUIMIA_SERVICE_NAME | alquimia | service.name resource attribute |
OTEL_EXCLUDED_ATTRIBUTES | "" | Comma-separated dimension keys to strip before export |
import asynciofrom alquimia.core.observability import setup_observability
# Minimal — reads all config from environment variablessetup_observability()
# Explicit — override specific parameterssetup_observability( endpoint="http://otel-collector:4318/v1/metrics", interval_millis=10_000, service_name="my-alquimia-app",)When running via alquimia-runtime, setup_telemetry() is called automatically in the FastAPI lifespan. You still need to call setup_observability() separately to enable metrics — the runtime’s setup_telemetry() only configures traces and logs.
AlquimiaObserver
Section titled “AlquimiaObserver”AlquimiaObserver is the class that holds all 18 metric instruments and dispatches observations based on event type. It is instantiated lazily on the first call to observe() after setup_observability() has been called.
You do not need to interact with AlquimiaObserver directly. The module-level observe() function handles dispatch:
from alquimia.core.observability import observefrom alquimia.core.base import CommonAttributes
# Called internally by alquimia-core — you do not call this manuallyobserve(event, context_metadata)observe() is called automatically by the evaluate() loop for every command and response event. No additional wiring is required.
Metrics reference
Section titled “Metrics reference”All metrics are exported under the alquimia-metrics meter (configurable via OTEL_ALQUIMIA_METER_NAME).
Standard dimensions
Section titled “Standard dimensions”Every metric carries a subset of the following dimensions, derived from CommonAttributes:
| Dimension key | Source | Description |
|---|---|---|
assistant_id | CommonAttributes.assistant_id | The agent being invoked |
agentspace_id | CommonAttributes.agentspace_id | The agentspace |
user_id | CommonAttributes.user_id | The end user |
session_id | CommonAttributes.session_id | The conversation session |
channel_id | CommonAttributes.channel_id | The channel (optional) |
Token and latency metrics additionally carry:
| Dimension key | Source | Description |
|---|---|---|
model_name | ResponseMetadata.model_name | The LLM model that produced the response |
Tool and shield metrics additionally carry:
| Dimension key | Source | Description |
|---|---|---|
event_type | type(event).__name__ | The command class name (e.g., ServerToolExecution) |
tool_name | event.name or event.tool_name | The tool or shield name |
Empathy metrics additionally carry:
| Dimension key | Source | Description |
|---|---|---|
rule_id | event.control_id | The empathy rule that matched |
LLM token counters
Section titled “LLM token counters”These counters increment on every successful ResponseInferenceResponse or ShieldInferenceResponse that carries token usage metadata.
| Metric | Type | Unit | Description |
|---|---|---|---|
alquimia.llm.completion_tokens | Counter | 1 | Tokens in the LLM’s response |
alquimia.llm.prompt_tokens | Counter | 1 | Tokens in the prompt sent to the LLM |
alquimia.llm.total_tokens | Counter | 1 | Total tokens (prompt + completion) |
Dimensions: standard + model_name
Note: If the LLM response does not include token usage metadata (some providers omit it), these counters are not incremented for that call. A DEBUG log is emitted: "Usage metadata not found. Skipping token consumption metrics."
LLM latency histograms
Section titled “LLM latency histograms”These histograms record timing data from ResponseMetadata.token_usage on every successful LLM response.
| Metric | Type | Unit | Description |
|---|---|---|---|
alquimia.llm.completion_time | Histogram | s | Time to generate the completion |
alquimia.llm.prompt_time | Histogram | s | Time to process the prompt |
alquimia.llm.queue_time | Histogram | s | Time the request spent waiting in the LLM provider’s queue |
alquimia.llm.total_time | Histogram | s | End-to-end request time (queue + prompt + completion) |
Dimensions: standard + model_name
Note: These are populated from provider-supplied metadata. Not all LLM providers return all timing fields. Fields that are None are skipped.
Tool invocation counters
Section titled “Tool invocation counters”| Metric | Type | Unit | Description |
|---|---|---|---|
alquimia.tool_invocations_total | Counter | 1 | Incremented when a tool execution command is emitted |
alquimia.tool_errors_total | Counter | 1 | Incremented when a ToolExecutionResponse has status="error" |
Dimensions: standard + event_type + tool_name
event_type values for tool invocations:
| Value | Meaning |
|---|---|
ServerToolExecution | MCP or Llama Stack tool call |
BuiltinToolExecution | Built-in tool (plan_mode, etc.) |
ClientToolExecution | Client-side tool call |
A2AInference | Agent-to-agent delegation (counted as a tool invocation) |
Shield invocation counters
Section titled “Shield invocation counters”| Metric | Type | Unit | Description |
|---|---|---|---|
alquimia.shield_invocations_total | Counter | 1 | Incremented when a ShieldInference command is emitted |
alquimia.shield_errors_total | Counter | 1 | Incremented when a ShieldInferenceResponse has status="error" |
Dimensions: standard
Response invocation counters
Section titled “Response invocation counters”| Metric | Type | Unit | Description |
|---|---|---|---|
alquimia.response_invocations_total | Counter | 1 | Incremented when a ResponseInference command is emitted (i.e., each LLM call) |
alquimia.response_errors_total | Counter | 1 | Incremented when a ResponseInferenceResponse has status="error" |
Dimensions: standard
Empathy rule counter
Section titled “Empathy rule counter”| Metric | Type | Unit | Description |
|---|---|---|---|
alquimia.empathy_rules_matched_by_rule_total | Counter | 1 | Incremented when an EmpathyRuleMatchedResponse is produced |
Dimensions: standard + rule_id
Use this metric to track which empathy rules fire most frequently. Group by rule_id and assistant_id to understand per-agent empathy behaviour.
Agent lifecycle counters
Section titled “Agent lifecycle counters”| Metric | Type | Unit | Description |
|---|---|---|---|
alquimia.assistant_executions_started_total | Counter | 1 | Incremented when an AssistantInference command is received |
alquimia.assistant_executions_ended_total | Counter | 1 | Incremented when an AssistantInferenceResponse is produced |
Dimensions: standard
The difference started - ended gives the number of in-flight inference runs at any point in time. A persistent gap indicates stuck or timed-out executions.
OTEL_EXCLUDED_ATTRIBUTES
Section titled “OTEL_EXCLUDED_ATTRIBUTES”Use this variable to strip specific dimension keys from all metrics before export. This is useful for privacy or compliance requirements where certain identifiers must not leave the application boundary.
# Strip user_id and session_id from all metric dimensionsOTEL_EXCLUDED_ATTRIBUTES=user_id,session_idThe filter is applied at the FilteredCounter and FilteredHistogram wrapper level — the stripped keys never reach the OTEL exporter. The filter applies to all metrics uniformly; per-metric filtering is not supported.
Note: OTEL_EXCLUDED_ATTRIBUTES applies only to metrics. It does not affect traces or logs. Use OTEL collector-level processors to filter those signals.
Dashboard recommendations
Section titled “Dashboard recommendations”Token cost tracking
Section titled “Token cost tracking”Group alquimia.llm.total_tokens by assistant_id and model_name. This gives per-agent token consumption, which maps directly to LLM API cost.
Error rate
Section titled “Error rate”alquimia.tool_errors_total / alquimia.tool_invocations_totalGroup by tool_name to identify unreliable tools.
Agent throughput
Section titled “Agent throughput”Rate of alquimia.assistant_executions_ended_total grouped by assistant_id.
Latency percentiles
Section titled “Latency percentiles”Use alquimia.llm.total_time histogram with P50/P95/P99 percentiles grouped by model_name to compare LLM provider performance.
Related pages
Section titled “Related pages”- Observability — the full observability model
- Runtime Observability — traces and logs
- Configuration Reference — OTEL environment variables
- evaluate() — where
observe()is called in the execution loop
Source
Section titled “Source”Alquimia-ai/alquimia-core—src/alquimia/core/observability.py