Inference Endpoints

The inference API is the primary interface for running agents. All endpoints are served by alquimia-runtime on port 8080.

Authentication

All endpoints (except health probes) require a bearer token in the Authorization header:

Authorization: Bearer <token>

The token format depends on AUTH_PROVIDER:

| AUTH_PROVIDER | Token format | Configuration | | --------------------- | ------------------------------------------ | --------------------------------- | | api_token (default) | Static string matching API_TOKEN env var | Set API_TOKEN in .env | | jwt | Signed JWT (HMAC HS256 by default) | Set JWT_SECRET, JWT_ALGORITHM | | keycloak | Keycloak OIDC access token | Set KEYCLOAK_* env vars |

See Configuration Reference for full auth settings.

Health endpoints

Health endpoints do not require authentication. Use them for Kubernetes liveness and readiness probes.

GET `/health/liveness`

Liveness probe — is the process alive?

Returns 200 OK if the process is running. Never checks external dependencies. Kubernetes restarts the pod if this fails.

curl http://localhost:8080/health/liveness
# "OK"

GET `/health/readiness`

Readiness probe — can the service handle traffic?

Returns 200 OK only if all critical dependencies are reachable: PostgreSQL (verified with SELECT 1) and Redis (verified with PING). Kubernetes stops routing traffic if this fails — the pod stays alive.

curl http://localhost:8080/health/readiness
# "OK"

Error responses:

| Status | Condition | | ------ | ------------------------------- | | 500 | PostgreSQL or Redis unreachable |

Request correlation headers

Every request to the runtime carries two sets of correlation identifiers.

X-Request-ID

The runtime generates or propagates an X-Request-ID header on every request:

| Direction | Behaviour | | ------------ | ----------------------------------------------------------------------------------------------------- | | Inbound | If the client provides X-Request-ID, it is used as-is. If omitted, the runtime generates a UUID v4. | | Outbound | The ID is always echoed back in the response X-Request-ID header. |

X-Request-ID is per-HTTP-request. Use it to correlate logs for a single HTTP call. Use task_id to correlate everything across an entire inference run (multiple HTTP calls + CloudEvent hops).

CommonAttributes headers

The inference endpoints accept the following headers, which are propagated as observability dimensions through the entire execution stack (metrics, traces, and logs):

| Header | Description | | --------------- | ----------------------------------- | | assistant-id | The agent being invoked | | agentspace-id | The agentspace containing the agent | | user-id | The end user | | session-id | The conversation session | | task-id | The inference task | | channel-id | The channel (optional) |

These headers are extracted by telemetry_metadata_middleware and attached to every OTEL metric, trace span, and log record for the duration of the request. See Observability for details.

Inference flow

Alquimia uses an async, event-driven inference model:

POST /event/infer/{assistant_id} — signs and publishes the request as a CloudEvent to Kafka, then returns a task_id immediately.
A worker instance consumes the event from Kafka, verifies its signature, and executes the agent in-process.
GET /event/stream/{task_id} — streams worklog records as SSE until the final AssistantInferenceResponse arrives.

This decoupling means your client never blocks waiting for the LLM. Long-running agents with multiple tool calls stream progress in real time.

POST `/event/infer/{assistant_id}`

Submit an inference request. Returns immediately with a task_id.

Path parameters

| Parameter | Type | Description | | -------------- | -------- | --------------------------------------------------------- | | assistant_id | string | The agent to invoke. Must exist in the target agentspace. |

Query parameters

| Parameter | Type | Default | Description | | --------------- | -------- | ----------- | ----------------------------------- | | agentspace_id | string | "default" | The agentspace containing the agent |

Request body

{
  "query": "What is the status of ticket INC-1234?",
  "user_id": "user-1",
  "session_id": "session-abc",
  "task_id": "task-xyz",
  "knowledge_base": [],
  "extra_instructions": {},
  "evaluation_strategy": null
}

| Field | Type | Required | Description | | --------------------- | -------------------- | -------- | -------------------------------------------------------------------- | | query | string | Yes | The user’s message | | user_id | string | Yes | User identifier for session scoping | | session_id | string | Yes | Session identifier for conversation continuity | | task_id | string | No | Custom task ID. Auto-generated (UUID) if omitted. | | knowledge_base | KnowledgeBase[] | No | Additional knowledge bases to inject for this request only | | extra_instructions | dict[str, str] | No | Named prompt clauses to inject into the system prompt | | evaluation_strategy | EvaluationStrategy | No | Override the agent’s configured evaluation strategy for this request |

Response — `200 OK`

{
  "task_id": "task-abc123",
  "session_id": "session-abc",
  "user_id": "user-1",
  "assistant_id": "my-agent",
  "agentspace_id": "default"
}

Response headers

| Header | Description | | -------------- | ---------------------------------------------------- | | X-Request-ID | The request correlation ID (generated or propagated) |

Error responses

| Status | Condition | | ------ | -------------------------------------------- | | 401 | Missing or invalid Authorization header | | 404 | Agent not found in the agentspace | | 500 | Connection error to Redis or CloudEvent sink |

Example

curl -X POST "http://localhost:8080/event/infer/my-agent?agentspace_id=default" \
  -H "Authorization: Bearer $API_TOKEN" \
  -H "Content-Type: application/json" \
  -d '{
    "query": "What is the status of ticket INC-1234?",
    "user_id": "user-1",
    "session_id": "session-abc"
  }'

GET `/event/stream/{task_id}`

Stream inference progress as Server-Sent Events (SSE). Emits worklog records until AssistantInferenceResponse is received.

Path parameters

| Parameter | Type | Description | | --------- | -------- | ------------------------------------------- | | task_id | string | The task ID returned by POST /event/infer |

Response

A stream of SSE events. Each event is a JSON-encoded WorklogRecord:

data: {"event_class": "ResponseInferenceResponse", "result": {...}, "status": "success"}

data: {"event_class": "ToolExecutionResponse", "tool_name": "search_tickets", "result": "INC-1234: Open", "status": "success"}

data: {"event_class": "AssistantInferenceResponse", "result": {"content": "Ticket INC-1234 is currently Open."}, "status": "success"}

The stream ends when event_class == "AssistantInferenceResponse".

Special event classes to watch for:

| event_class | Action required | | ---------------------------- | ------------------------------------------------------------------- | | HumanApprovalRequired | Call POST /event/tool-approval to approve or reject | | ClientToolExecution | Execute the tool client-side and call POST /event/tool-completion | | AssistantInferenceResponse | Final answer — close the stream |

Error responses

| Status | Condition | | ------ | ----------------------------------------- | | 401 | Missing or invalid Authorization header | | 404 | No ongoing task with the given task_id |

Example

curl -N "http://localhost:8080/event/stream/task-abc123" \
  -H "Authorization: Bearer $API_TOKEN"

POST `/event/tool-approval`

Approve or reject a tool execution that requires human approval.

Call this endpoint when the SSE stream emits a HumanApprovalRequired event. The agent blocks until a response is received.

Request body

{
  "control_id": "tool-call-xyz",
  "result": true,
  "message": ""
}

| Field | Type | Description | | ------------ | -------- | ------------------------------------------------------- | | control_id | string | The control_id from the HumanApprovalRequired event | | result | bool | true to approve, false to reject | | message | string | Rejection reason (required when result is false) |

Required context headers

These headers identify the active inference session. Copy them from the original POST /event/infer response:

| Header | Description | | --------------- | -------------------------------------- | | task-id | The task ID from the inference request | | session-id | The session ID | | user-id | The user ID | | assistant-id | The assistant ID | | agentspace-id | The agentspace ID |

Example

curl -X POST http://localhost:8080/event/tool-approval \
  -H "Authorization: Bearer $API_TOKEN" \
  -H "Content-Type: application/json" \
  -H "task-id: task-abc123" \
  -H "session-id: session-abc" \
  -H "user-id: user-1" \
  -H "assistant-id: my-agent" \
  -H "agentspace-id: default" \
  -d '{"control_id": "tool-call-xyz", "result": true, "message": ""}'

POST `/event/tool-completion`

Submit the result of a client-side tool execution.

Call this endpoint when the SSE stream emits a ClientToolExecution event and your application has executed the tool locally.

Request body

{
  "control_id": "tool-call-xyz",
  "tool_name": "my_tool",
  "result": "Tool output here",
  "status": "success"
}

| Field | Type | Description | | ------------ | ---------------------- | ----------------------------------------------------- | | control_id | string | The control_id from the ClientToolExecution event | | tool_name | string | The tool name | | result | any | The tool’s output | | status | "success" \| "error" | Execution status | | message | string | Error message (when status is "error") |

Required context headers

Same as /event/tool-approval — task-id, session-id, user-id, assistant-id, agentspace-id.

Channel inference

Agents can receive messages from external channels (WhatsApp, Slack, Email). Channel endpoints normalize provider-specific payloads into the standard inference pipeline.

Channels are declared in the agent’s channels array in its AssistantConfig. Each channel has a channel_id that maps to the {channel_id} path segment in the URL. The runtime looks up the channel by this ID and calls its adapter.

{
  "assistant_id": "support-agent",
  "channels": [
    {
      "provider_id": "whatsapp",
      "channel_id": "whatsapp-prod",
      "whatsapp_access_token": { "$secretRef": "WHATSAPP_ACCESS_TOKEN" },
      "whatsapp_api_base_url": { "$secretRef": "WHATSAPP_API_BASE_URL" },
      "whatsapp_assistant_phone_number_id": {
        "$secretRef": "WHATSAPP_PHONE_NUMBER_ID"
      },
      "whatsapp_verify_token": { "$secretRef": "WHATSAPP_VERIFY_TOKEN" }
    }
  ]
}

See Channels for full configuration details for WhatsApp, Slack, and Email.

GET `/event/infer/{assistant_id}/{channel_id}`

Webhook validation endpoint. Used by channel providers (e.g., WhatsApp Business API) to verify the webhook URL during setup. Does not require authentication.

| Parameter | Type | Description | | -------------- | -------- | --------------------------------------------------------- | | assistant_id | string | The agent to validate for | | channel_id | string | Must match a channel_id in the agent’s channels array |

POST `/event/infer/{assistant_id}/{channel_id}`

Receive a message from a channel and dispatch it as an inference request. Does not require authentication — the channel provider’s own verification mechanism is used instead.

The request body format is provider-specific (WhatsApp JSON webhook, Slack URL-encoded form, etc.). The channel adapter normalizes it into an AssistantInference command. The response is a CommonAttributes object identical to POST /event/infer/{assistant_id}.

Error responses:

| Status | Condition | | ------ | --------------------------------------------- | | 400 | channel_id not found in agent config | | 400 | Channel produced no commands from the payload |

Complete inference example

End-to-end: submit a request, stream the response, handle a tool approval.

# 1. Submit inference
RESPONSE=$(curl -s -X POST "http://localhost:8080/event/infer/support-agent" \
  -H "Authorization: Bearer $API_TOKEN" \
  -H "Content-Type: application/json" \
  -d '{
    "query": "Close ticket INC-1234 with resolution: resolved by restart",
    "user_id": "user-1",
    "session_id": "session-abc"
  }')

TASK_ID=$(echo $RESPONSE | jq -r '.task_id')

# 2. Stream results (in background or separate terminal)
curl -N "http://localhost:8080/event/stream/$TASK_ID" \
  -H "Authorization: Bearer $API_TOKEN" &

# 3. If HumanApprovalRequired event arrives, approve it:
curl -X POST http://localhost:8080/event/tool-approval \
  -H "Authorization: Bearer $API_TOKEN" \
  -H "Content-Type: application/json" \
  -H "task-id: $TASK_ID" \
  -H "session-id: session-abc" \
  -H "user-id: user-1" \
  -H "assistant-id: support-agent" \
  -H "agentspace-id: default" \
  -d '{"control_id": "<control_id from event>", "result": true, "message": ""}'

Context & Knowledge API — session context, blob uploads, knowledge base management
Registry API — manage agents and agentspaces
Configuration Reference — auth and environment variables
Event Model — the typed CloudEvents that drive inference
Observability — X-Request-ID, CommonAttributes, and correlation

Inference Endpoints

Authentication

Health endpoints

GET /health/liveness

GET /health/readiness

Request correlation headers

X-Request-ID

CommonAttributes headers

Inference flow

POST /event/infer/{assistant_id}

Path parameters

Query parameters

Request body

Response — 200 OK

Response headers

Error responses

Example

GET /event/stream/{task_id}

Path parameters

Response

Error responses

Example

POST /event/tool-approval

Request body

Required context headers

Example

POST /event/tool-completion

Request body

Required context headers

Channel inference

GET /event/infer/{assistant_id}/{channel_id}

POST /event/infer/{assistant_id}/{channel_id}

Complete inference example

Related pages

GET `/health/liveness`

GET `/health/readiness`

POST `/event/infer/{assistant_id}`

Response — `200 OK`

GET `/event/stream/{task_id}`

POST `/event/tool-approval`

POST `/event/tool-completion`

GET `/event/infer/{assistant_id}/{channel_id}`

POST `/event/infer/{assistant_id}/{channel_id}`