Skip to content

Evaluation Strategies

An evaluation strategy determines how the LLM’s response is interpreted after each inference call. It controls tool binding, response parsing, and the iteration loop.

Strategy IDClassUse case
one-shootOneShootEvaluationStrategySingle LLM call, no tools
nativeNativeToolsEvaluationStrategyNative function calling (OpenAI, Anthropic)
rawRawToolsEvaluationStrategyJSON extraction from text (models without native tool calling)

The simplest strategy. Makes a single LLM call and returns the response. No tools are bound.

{
"evaluation_strategy": {
"evaluation_strategy_id": "one-shoot"
}
}

Use this for:

  • Simple Q&A agents
  • Summarization
  • Classification (when using shields)
  • Any agent that doesn’t need tools

Uses the LLM’s native function-calling API (e.g., OpenAI tool_calls). Supports multi-step tool execution loops.

{
"evaluation_strategy": {
"evaluation_strategy_id": "native",
"max_steps": 10,
"max_concurrent_tools": 5,
"tool_choice": "auto"
}
}
FieldTypeDefaultDescription
evaluation_strategy_id"native"requiredStrategy identifier
max_stepsint10Maximum tool execution iterations
max_concurrent_toolsint5Maximum parallel tool calls per step
tool_choicestring | object"auto"Tool selection mode
decoratorsDecorator[]nullPluggable behavior extensions
ValueBehavior
"auto"LLM decides whether to call a tool
"required"LLM must call at least one tool
"none"LLM must not call any tools
ToolChoiceAllowedToolsRestrict to a specific set of tools
ToolChoiceForcedTool[]Force specific tool calls

The native strategy automatically injects into the system prompt:

# System limitations
- Max steps allowed: 10
- Max concurrent tool executions: 5
- Current step: 0

Decorators extend the native strategy with additional capabilities:

Adds create_plan and patch_plan tools. The agent creates and incrementally updates a structured plan as it works.

{
"evaluation_strategy": {
"evaluation_strategy_id": "native",
"decorators": [
{
"decorator_id": "plan_mode",
"plan_format": "markdown"
}
]
}
}

Extracts tool calls from the LLM’s text output using JSON pattern matching. Use this with models that don’t support native function calling.

{
"evaluation_strategy": {
"evaluation_strategy_id": "raw",
"max_steps": 10,
"tool_schemas": [
{
"name": "search_web",
"description": "Search the web for information.",
"parameters": {
"type": "object",
"properties": {
"query": { "type": "string", "description": "Search query" }
},
"required": ["query"]
}
}
]
}
}
FieldTypeDefaultDescription
evaluation_strategy_id"raw"requiredStrategy identifier
max_stepsint10Maximum tool execution iterations
max_concurrent_toolsint5Maximum parallel tool calls per step
tool_schemasToolSchema[]requiredTool definitions injected into the system prompt
parse_regex_patternstringJSON patternRegex for extracting JSON from text

The raw strategy injects tool instructions and the available tool list into the system prompt automatically. The LLM is instructed to respond with:

{
"name": "search_web",
"parameters": {
"query": "capital of France"
}
}

The strategy tries multiple parsing approaches in order:

  1. Direct json.loads()
  2. Single-quote replacement
  3. Brace block extraction
  4. Regex pattern matching

All strategies support structured output via with_structured_output():

{
"evaluation_strategy": {
"evaluation_strategy_id": "one-shoot",
"structured_output": {
"method": "json_schema",
"include_raw": false,
"json_schema": {
"type": "object",
"properties": {
"answer": { "type": "string" },
"confidence": { "type": "number" }
},
"required": ["answer", "confidence"]
}
}
}
}
FieldTypeDefaultDescription
methodstring"json_schema"Structured output method
include_rawboolfalseInclude the raw LLM response alongside the parsed output
json_schemadict{}JSON Schema for the expected output