Thinking
Thinking (or reasoning) is the process by which a model works through a problem step-by-step before providing its final answer.
The simplest way to enable thinking across supported providers is the Thinking capability.
Provider-specific settings are available for advanced usage when you need direct access to a provider's native thinking controls.
Unified thinking settings
Use the Thinking capability to enable thinking:
from pydantic_ai import Agent
from pydantic_ai.capabilities import Thinking
agent = Agent('gateway/anthropic:claude-opus-4-7', capabilities=[Thinking(effort='high')])
from pydantic_ai import Agent
from pydantic_ai.capabilities import Thinking
agent = Agent('anthropic:claude-opus-4-7', capabilities=[Thinking(effort='high')])
You can also set the underlying thinking field in ModelSettings directly:
from pydantic_ai import Agent
agent = Agent('gateway/anthropic:claude-opus-4-7', model_settings={'thinking': 'high'})
from pydantic_ai import Agent
agent = Agent('anthropic:claude-opus-4-7', model_settings={'thinking': 'high'})
The Thinking.effort value accepts:
True— enable thinking with the provider's default effort levelFalse— disable thinking (silently ignored on always-on models)'minimal'/'low'/'medium'/'high'/'xhigh'— enable thinking at a specific effort level (unsupported levels map to the closest available value)
These are the same values accepted by the underlying thinking model setting.
When omitted, the model uses its default behavior. Provider-specific settings (documented in the sections below) take precedence when both are set.
Provider translation
The Thinking capability maps each effort value to the selected provider's native format:
| Provider | Thinking() / Thinking(effort=True) |
Thinking(effort='high') |
Notes |
|---|---|---|---|
| Anthropic (Opus 4.6+) | anthropic_thinking={'type': 'adaptive'} |
{type: 'adaptive'} + effort='high' |
Claude Opus 4.7 also supports effort='xhigh' |
| Anthropic (older) | anthropic_thinking={'type': 'enabled', 'budget_tokens': 10000} |
budget_tokens=16384 |
Budget-based; 'low' → 2048 tokens |
| OpenAI | reasoning_effort='medium' |
reasoning_effort='high' |
|
| Google (Gemini 3+) | include_thoughts=True |
thinking_level='HIGH' |
|
| Google (Gemini 2.5) | include_thoughts=True |
thinking_budget=24576 |
|
| Groq | reasoning_format='parsed' |
reasoning_format='parsed' |
thinking=False → 'hidden' (no true disable) |
| OpenRouter | reasoning.effort='medium' |
reasoning.effort='high' |
Via extra_body |
| Cerebras | disable_reasoning=False |
disable_reasoning=False |
thinking=False → disable_reasoning=True |
| xAI | reasoning_effort='high' |
reasoning_effort='high' |
Only 'low' and 'high' |
| Bedrock (Claude) | thinking.type='enabled' |
budget_tokens=16384 |
No adaptive support |
| Bedrock (OpenAI) | reasoning_effort='medium' |
reasoning_effort='high' |
OpenAI
When using the OpenAIChatModel, text output inside <think> tags are converted to ThinkingPart objects.
You can customize the tags using the thinking_tags field on the model profile.
Some OpenAI-compatible model providers might also support native thinking parts that are not delimited by tags. Instead, they are sent and received as separate, custom fields in the API. Typically, if you are calling the model via the <provider>:<model> shorthand, Pydantic AI handles it for you. Nonetheless, you can still configure the fields with openai_chat_thinking_field.
If your provider recommends to send back these custom fields not changed, for caching or interleaved thinking benefits, you can also achieve this with openai_chat_send_back_thinking_parts.
OpenAI Responses
The OpenAIResponsesModel can generate native thinking parts.
To enable this functionality, you need to set the
OpenAIResponsesModelSettings.openai_reasoning_effort and OpenAIResponsesModelSettings.openai_reasoning_summary model settings.
By default, the unique IDs of reasoning, text, and function call parts from the message history are sent to the model, which can result in errors like "Item 'rs_123' of type 'reasoning' was provided without its required following item."
if the message history you're sending does not match exactly what was received from the Responses API in a previous response, for example if you're using a history processor.
To disable this, you can disable the OpenAIResponsesModelSettings.openai_send_reasoning_ids model setting.
from pydantic_ai import Agent
from pydantic_ai.models.openai import OpenAIResponsesModel, OpenAIResponsesModelSettings
model = OpenAIResponsesModel('gpt-5.2')
settings = OpenAIResponsesModelSettings(
openai_reasoning_effort='low',
openai_reasoning_summary='detailed',
)
agent = Agent(model, model_settings=settings)
...
Raw reasoning without summaries
Some OpenAI-compatible APIs (such as LM Studio, vLLM, or OpenRouter with gpt-oss models) may return raw reasoning content without reasoning summaries. In this case, ThinkingPart.content will be empty, but the raw reasoning is available in provider_details['raw_content']. Following OpenAI's guidance that raw reasoning should not be shown directly to users, we store it in provider_details rather than in the main content field.
Anthropic
To enable thinking, use the AnthropicModelSettings.anthropic_thinking model setting.
Note
Extended thinking (type: 'enabled' with budget_tokens) is deprecated on claude-opus-4-6 and removed on claude-opus-4-7+. For those models, use adaptive thinking instead.
from pydantic_ai import Agent
from pydantic_ai.models.anthropic import AnthropicModel, AnthropicModelSettings
model = AnthropicModel('claude-sonnet-4-5')
settings = AnthropicModelSettings(
anthropic_thinking={'type': 'enabled', 'budget_tokens': 1024},
)
agent = Agent(model, model_settings=settings)
...
Interleaved Thinking
To enable interleaved thinking, you need to include the beta header in your model settings:
from pydantic_ai import Agent
from pydantic_ai.models.anthropic import AnthropicModel, AnthropicModelSettings
model = AnthropicModel('claude-sonnet-4-5')
settings = AnthropicModelSettings(
anthropic_thinking={'type': 'enabled', 'budget_tokens': 10000},
extra_headers={'anthropic-beta': 'interleaved-thinking-2025-05-14'},
)
agent = Agent(model, model_settings=settings)
...
Adaptive Thinking & Effort
Starting with claude-opus-4-6, Anthropic supports adaptive thinking, where the model dynamically decides when and how much to think based on the complexity of each request. This replaces extended thinking (type: 'enabled' with budget_tokens) which is deprecated on Opus 4.6 and removed on Opus 4.7. Claude Opus 4.7 also adds the xhigh effort level. Adaptive thinking also automatically enables interleaved thinking.
from pydantic_ai import Agent
from pydantic_ai.models.anthropic import AnthropicModel, AnthropicModelSettings
model = AnthropicModel('claude-opus-4-7')
settings = AnthropicModelSettings(
anthropic_thinking={'type': 'adaptive'},
anthropic_effort='high',
)
agent = Agent(model, model_settings=settings)
...
The anthropic_effort setting controls how much effort the model puts into its response (independent of thinking). See the Anthropic effort docs for details.
Note
Older models (claude-sonnet-4-5, claude-opus-4-5, etc.) do not support adaptive thinking and require {'type': 'enabled', 'budget_tokens': N} as shown above.
Thinking tokens count against Anthropic's loop-wide task budgets, so adaptive thinking naturally scales down as the budget depletes.
For advanced usage, use the GoogleModelSettings.google_thinking_config model setting.
from pydantic_ai import Agent
from pydantic_ai.models.google import GoogleModel, GoogleModelSettings
model = GoogleModel('gemini-3.5-flash')
settings = GoogleModelSettings(google_thinking_config={'include_thoughts': True, 'thinking_level': 'MEDIUM'})
agent = Agent(model, model_settings=settings)
...
See the Google model docs for more details.
xAI
xAI reasoning models (Grok) support native thinking. To preserve the thinking content for multi-turn conversations, enable XaiModelSettings.xai_include_encrypted_content.
from pydantic_ai import Agent
from pydantic_ai.models.xai import XaiModel, XaiModelSettings
model = XaiModel('grok-4-fast-reasoning')
settings = XaiModelSettings(xai_include_encrypted_content=True)
agent = Agent(model, model_settings=settings)
...
Bedrock
Although Bedrock Converse doesn't provide a unified API to enable thinking, you can still use BedrockModelSettings.bedrock_additional_model_requests_fields model setting to pass provider-specific configuration:
from pydantic_ai import Agent
from pydantic_ai.models.bedrock import BedrockConverseModel, BedrockModelSettings
model = BedrockConverseModel('us.anthropic.claude-sonnet-4-5-20250929-v1:0')
model_settings = BedrockModelSettings(
bedrock_additional_model_requests_fields={
'thinking': {'type': 'enabled', 'budget_tokens': 1024}
}
)
agent = Agent(model=model, model_settings=model_settings)
from pydantic_ai import Agent
from pydantic_ai.models.bedrock import BedrockConverseModel, BedrockModelSettings
model = BedrockConverseModel('openai.gpt-oss-120b-1:0')
model_settings = BedrockModelSettings(
bedrock_additional_model_requests_fields={'reasoning_effort': 'low'}
)
agent = Agent(model=model, model_settings=model_settings)
from pydantic_ai import Agent
from pydantic_ai.models.bedrock import BedrockConverseModel, BedrockModelSettings
model = BedrockConverseModel('qwen.qwen3-32b-v1:0')
model_settings = BedrockModelSettings(
bedrock_additional_model_requests_fields={'reasoning_config': 'high'}
)
agent = Agent(model=model, model_settings=model_settings)
Reasoning is always enabled for Deepseek model
from pydantic_ai import Agent
from pydantic_ai.models.bedrock import BedrockConverseModel
model = BedrockConverseModel('us.deepseek.r1-v1:0')
agent = Agent(model=model)
Groq
Groq supports different formats to receive thinking parts:
"raw": The thinking part is included in the text content inside<think>tags, which are automatically converted toThinkingPartobjects."hidden": The thinking part is not included in the text content."parsed": The thinking part has its own structured part in the response which is converted into aThinkingPartobject.
To enable thinking, use the GroqModelSettings.groq_reasoning_format model setting:
from pydantic_ai import Agent
from pydantic_ai.models.groq import GroqModel, GroqModelSettings
model = GroqModel('qwen/qwen3-32b')
settings = GroqModelSettings(groq_reasoning_format='parsed')
agent = Agent(model, model_settings=settings)
...
Note
Groq does not support truly disabling thinking. When thinking=False is set via the unified setting, Pydantic AI sends reasoning_format='hidden', which suppresses reasoning output but the model may still reason internally.
OpenRouter
To enable thinking, use the OpenRouterModelSettings.openrouter_reasoning model setting.
from pydantic_ai import Agent
from pydantic_ai.models.openrouter import OpenRouterModel, OpenRouterModelSettings
model = OpenRouterModel('openai/gpt-5.2')
settings = OpenRouterModelSettings(openrouter_reasoning={'effort': 'high'})
agent = Agent(model, model_settings=settings)
...
Mistral
Thinking is supported by the magistral family of models. It does not need to be specifically enabled.
Cohere
Thinking is supported by the command-a-reasoning-08-2025 model. It does not need to be specifically enabled.
Hugging Face
Text output inside <think> tags is automatically converted to ThinkingPart objects.
You can customize the tags using the thinking_tags field on the model profile.