Kubernetes & OpenShift
alquimia-runtime ships with production-ready Kubernetes manifests in k8s/ structured as Kustomize overlays. This guide covers deploying to vanilla Kubernetes and OpenShift / ROSA.
For local development with Docker Compose, see Docker Compose.
Prerequisites
Section titled “Prerequisites”| Tool | Version |
|---|---|
| kubectl / oc | ≥ 1.27 / ≥ 4.13 |
| kustomize | ≥ 5.0 (bundled in kubectl apply -k) |
The manifests target a namespace called alquimia-runtime. Create it first:
kubectl create namespace alquimia-runtimeoc new-project alquimia-runtimeInfrastructure dependencies
Section titled “Infrastructure dependencies”The runtime does not provision its own backing services. These must already exist and be reachable from the cluster before applying the manifests:
| Service | Used by | Notes | |---|---|---| | Apache Kafka 3.x | Event bus | Master publishes, workers consume | | PostgreSQL 16 | Knowledge base, worklog, webhooks | | | Redis 7 | Conversation context, distributed locks | | | Qdrant | Vector store (master only) | | | HashiCorp Vault | Agent secret resolution | | | S3-compatible store | Blob storage (MinIO, AWS S3, Ceph) | | | OCI registry | Agent package publish/pull | |
Repository layout
Section titled “Repository layout”k8s/├── base/│ ├── kustomization.yaml — base resource list│ ├── master-deployment.yaml — master Deployment (API + s3-sync sidecar)│ ├── worker-deployment.yaml — worker Deployment (Kafka consumer + s3-sync sidecar)│ ├── service.yaml — ClusterIP + headless Services└── overlays/ ├── dev/ │ ├── kustomization.yaml — image tag, resource patches, generators │ ├── .secrets/ — gitignored plaintext files (never commit these) │ │ ├── config — app ConfigMap (env vars) │ │ ├── postgres — POSTGRES_* credentials │ │ ├── redis — REDIS_URL │ │ ├── s3 — BLOB_S3_* credentials │ │ ├── vault — VAULT_TOKEN │ │ ├── auth — API_TOKEN │ │ ├── kafka-signing — KAFKA_SIGNING_KEY │ │ ├── qdrant — QDRANT_URL, QDRANT_API_KEY │ │ └── .registryconfigjson — OCI pull credentials │ └── patches/ │ ├── master-resource-limits.yaml │ └── worker-resource-limits.yamlPrepare secret files
Section titled “Prepare secret files”The overlay reads plaintext key=value files from .secrets/. These files are gitignored and must be created manually.
.secrets/config — ConfigMap
Section titled “.secrets/config — ConfigMap”Non-secret runtime environment variables:
ALQUIMIA_REGISTRY_SECRET_RESOLVER=vaultALQUIMIA_OCI_REGISTRY_DEFAULT=<registry-host>:<port>ORAS_INSECURE=trueORAS_PLAIN_HTTP=trueVAULT_ADDR=http://<vault-host>:<port>VAULT_MOUNT_POINT=secretALQUIMIA_REGISTRY_DIR=/var/lib/alquimia/registryKAFKA_BOOTSTRAP_SERVERS=<broker-host>:<port>DEBUG=falseIS_ALLOWED_CREDENTIALS=trueOTEL_ALQUIMIA_SERVICE_NAME=alquimia-runtime.secrets/redis
Section titled “.secrets/redis”REDIS_URL=redis://<host>:<port>/0.secrets/postgres
Section titled “.secrets/postgres”POSTGRES_HOST=<host>POSTGRES_PORT=5432POSTGRES_DB=alquimiaPOSTGRES_USERNAME=<username>POSTGRES_PASSWORD=<password>POSTGRES_SCHEMA=postgresql.secrets/s3
Section titled “.secrets/s3”BLOB_S3_ENDPOINT_URL=http://<minio-or-s3-endpoint>BLOB_S3_ACCESS_KEY=<access-key>BLOB_S3_SECRET_KEY=<secret-key>BLOB_S3_BUCKET_NAME=alquimiaBLOB_S3_REGION_NAME=us-east-1BLOB_S3_SECURE=trueBLOB_S3_PROVIDER=Minio # See https://rclone.org/s3/ for supported providers.secrets/auth
Section titled “.secrets/auth”API_TOKEN=<strong-random-token>Generate: python -c "import secrets; print(secrets.token_urlsafe(32))"
.secrets/kafka-signing
Section titled “.secrets/kafka-signing”KAFKA_SIGNING_KEY=<64-char-hex>This key must be identical on the master and all workers — it authenticates every Kafka event:
python -c "import secrets; print(secrets.token_hex(32))".secrets/vault
Section titled “.secrets/vault”VAULT_TOKEN=<scoped-vault-token>Create a scoped token:
vault token create -policy=alquimia-runtime -ttl=720h -renewable=true.secrets/qdrant
Section titled “.secrets/qdrant”QDRANT_URL=http://<qdrant-host>:<port>QDRANT_API_KEY=Leave QDRANT_API_KEY empty for unauthenticated Qdrant.
.secrets/.registryconfigjson
Section titled “.secrets/.registryconfigjson”Standard Docker config.json for your OCI registry:
{ "auths": { "<registry-host>": { "username": "<username>", "password": "<password>", "auth": "<base64(username:password)>" } }}Generate the auth value: echo -n "username:password" | base64
Deploy
Section titled “Deploy”-
Verify secret files are in place
Terminal window ls k8s/overlays/dev/.secrets/# auth config kafka-signing postgres qdrant redis s3 vault .registryconfigjson -
Preview the generated manifests (dry run)
Terminal window kubectl kustomize k8s/overlays/dev/ -
Apply to the cluster
Terminal window kubectl apply -k k8s/overlays/dev/ -n alquimia-runtime -
Watch pods come up
Terminal window kubectl get pods -n alquimia-runtime -wExpected steady state — each pod shows
2/2because both thealquimia-runtimecontainer and thes3-syncsidecar must be ready:alquimia-runtime-master-<hash> 2/2 Running 0 60salquimia-runtime-worker-<hash> 2/2 Running 0 60salquimia-runtime-worker-<hash> 2/2 Running 0 60s -
Verify health
Terminal window kubectl exec -n alquimia-runtime \$(kubectl get pod -l role=master -n alquimia-runtime -o jsonpath='{.items[0].metadata.name}') \-- curl -sf http://localhost:8080/health/readiness# → "OK"
-
Verify secret files are in place (same as Kubernetes)
-
Apply with
ocTerminal window oc apply -k k8s/overlays/dev/ -n alquimia-runtime -
Watch pods come up
Terminal window oc get pods -n alquimia-runtime -w -
Create a Route for external access
Terminal window # HTTPoc expose svc/alquimia-runtime -n alquimia-runtime# TLS edge termination (recommended)oc create route edge alquimia-runtime-tls \--service=alquimia-runtime \--hostname=api.alquimia.<your-domain> \-n alquimia-runtime
Expose the master externally
Section titled “Expose the master externally”Kubernetes — Ingress
Section titled “Kubernetes — Ingress”apiVersion: networking.k8s.io/v1kind: Ingressmetadata: name: alquimia-runtime namespace: alquimia-runtime annotations: # SSE streams need long-lived connections nginx.ingress.kubernetes.io/proxy-read-timeout: "300" nginx.ingress.kubernetes.io/proxy-send-timeout: "300"spec: rules: - host: api.alquimia.example.com http: paths: - path: / pathType: Prefix backend: service: name: alquimia-runtime port: number: 80 tls: - hosts: - api.alquimia.example.com secretName: alquimia-tlsKey resource details
Section titled “Key resource details”| Resource | Value |
|---|---|
| Master replicas | 1 — serialised registry writes via TinyDB + Redlock |
| Worker replicas | 2 (dev) — scale to match Kafka partition count |
| Container port | 8080 |
| Readiness probe | GET /health/readiness |
| Liveness probe | GET /health/liveness |
| Startup probe | GET /health/liveness (90 retries × 10 s) |
| CPU request / limit | 500m / 1 (dev overlay) |
| Memory request / limit | 1Gi / 2Gi |
| Worker registry | emptyDir synced from S3 via s3-sync sidecar |
Scaling workers
Section titled “Scaling workers”Workers are stateless Kafka consumers. All replicas share the same consumer group so Kafka distributes partitions across them automatically.
kubectl scale deployment alquimia-runtime-worker --replicas=4 -n alquimia-runtimeTo make the change permanent, add replicas: 4 to k8s/overlays/dev/patches/worker-resource-limits.yaml. The upper bound for throughput is one replica per Kafka partition — additional replicas beyond that will be idle consumers.
Upgrading
Section titled “Upgrading”-
Build and push the new image
Terminal window docker build -t alquimiaai/runtime:<new-tag> runtime/docker push alquimiaai/runtime:<new-tag> -
Update
newTagin the overlay and re-applyTerminal window kubectl apply -k k8s/overlays/dev/ -n alquimia-runtime -
Run database migrations once the master pod is healthy
Terminal window kubectl exec -n alquimia-runtime \$(kubectl get pod -l role=master -n alquimia-runtime -o jsonpath='{.items[0].metadata.name}') \-c alquimia-runtime \-- uv run alembic upgrade head
Tear down
Section titled “Tear down”kubectl delete -k k8s/overlays/dev/ -n alquimia-runtime --ignore-not-found=true
# Delete the namespace entirely (removes PVCs too)kubectl delete namespace alquimia-runtimeTroubleshooting
Section titled “Troubleshooting”Workers log missing or invalid alquimiasignature and drop all events
KAFKA_SIGNING_KEY differs between master and workers. Regenerate and redeploy:
NEW_KEY=$(python -c "import secrets; print(secrets.token_hex(32))")# Update .secrets/kafka-signing with the new keykubectl apply -k k8s/overlays/dev/ -n alquimia-runtimekubectl rollout restart deployment/alquimia-runtime-master deployment/alquimia-runtime-worker -n alquimia-runtimeGET /health/readiness returns 500
PostgreSQL or Redis is unreachable. Check POSTGRES_HOST in alquimia-postgres Secret and REDIS_URL in alquimia-redis Secret point to the correct in-cluster DNS names, and that network policies allow traffic from alquimia-runtime to those services.
Vault token expired — registry secret resolution fails
vault token renew <token># or create a new one:NEW_TOKEN=$(vault token create -policy=alquimia-runtime -ttl=720h -field=token)kubectl create secret generic alquimia-vault \ --from-literal=VAULT_TOKEN=$NEW_TOKEN \ -n alquimia-runtime \ --dry-run=client -o yaml | kubectl apply -f -kubectl rollout restart deployment/alquimia-runtime-master -n alquimia-runtimeRelated pages
Section titled “Related pages”- Docker Compose — local development setup
- Configuration Reference — all environment variables
- Kafka & Event Bus — how the Kafka event pipeline works
- Inference Endpoints — the public HTTP API
Source
Section titled “Source”Alquimia-ai/alquimia-runtime—k8s/manifests and Kustomize overlays