`pydantic_ai.models.anthropic`

Setup

For details on how to set up authentication with this model, see model configuration for Anthropic.

LatestAnthropicModelNames `module-attribute`

LatestAnthropicModelNames = ModelParam

Anthropic model names from the installed SDK.

AnthropicModelName `module-attribute`

AnthropicModelName = LatestAnthropicModelNames

Possible Anthropic model names.

The installed Anthropic SDK exposes the current literal set and still allows arbitrary string model names. See the Anthropic docs for a full list.

AnthropicTaskBudget `module-attribute`

AnthropicTaskBudget: TypeAlias = BetaTokenTaskBudgetParam

Anthropic task budget payload for output_config.task_budget.

AnthropicModelSettings

Bases: ModelSettings

Settings used for an Anthropic model request.

Source code in pydantic_ai_slim/pydantic_ai/models/anthropic.py

class AnthropicModelSettings(ModelSettings, total=False):
    """Settings used for an Anthropic model request."""

    # ALL FIELDS MUST BE `anthropic_` PREFIXED SO YOU CAN MERGE THEM WITH OTHER MODELS.

    anthropic_metadata: BetaMetadataParam
    """An object describing metadata about the request.

    Contains `user_id`, an external identifier for the user who is associated with the request.
    """

    anthropic_thinking: BetaThinkingConfigParam
    """Determine whether the model should generate a thinking block.

    See [the Anthropic docs](https://docs.anthropic.com/en/docs/build-with-claude/extended-thinking) for more information.
    """

    anthropic_cache_tool_definitions: bool | Literal['5m', '1h']
    """Whether to add `cache_control` to the last tool definition.

    When enabled, the last tool in the `tools` array will have `cache_control` set,
    allowing Anthropic to cache tool definitions and reduce costs.
    If `True`, uses TTL='5m'. You can also specify '5m' or '1h' directly.
    See https://docs.anthropic.com/en/docs/build-with-claude/prompt-caching for more information.
    """

    anthropic_service_tier: Literal['auto', 'standard_only']
    """The service tier to use for the model request.

    See https://docs.anthropic.com/en/docs/build-with-claude/latency-and-throughput for more information.
    """

    anthropic_cache_instructions: bool | Literal['5m', '1h']
    """Whether to add `cache_control` to the last system prompt block.

    When enabled, the last system prompt will have `cache_control` set,
    allowing Anthropic to cache system instructions and reduce costs.
    If `True`, uses TTL='5m'. You can also specify '5m' or '1h' directly.
    See https://docs.anthropic.com/en/docs/build-with-claude/prompt-caching for more information.
    """

    anthropic_cache_messages: bool | Literal['5m', '1h']
    """Whether to add `cache_control` to the last message content block.

    This is an alternative to `anthropic_cache` for Anthropic-compatible gateways and
    proxies that accept the Anthropic message format but don't support the top-level
    automatic caching parameter.

    If `True`, uses TTL='5m'. You can also specify '5m' or '1h' directly.
    Cannot be combined with `anthropic_cache`.
    """

    anthropic_cache: bool | Literal['5m', '1h']
    """Enable prompt caching for multi-turn conversations.

    Passes a top-level `cache_control` parameter so the server automatically applies a
    cache breakpoint to the last cacheable block and moves it forward as conversations grow.

    On Bedrock and Vertex, automatic caching is not yet supported, so this falls back to
    per-block caching on the last user message. If the last content block already has
    `cache_control` from an explicit `CachePoint`, it is preserved.

    If `True`, uses TTL='5m'. You can also specify '5m' or '1h' directly.

    This can be combined with explicit cache breakpoints (`anthropic_cache_instructions`,
    `anthropic_cache_tool_definitions`, `CachePoint`). The automatic breakpoint counts as
    1 of Anthropic's 4 cache point slots; we automatically trim excess explicit breakpoints.
    See https://docs.anthropic.com/en/docs/build-with-claude/prompt-caching#automatic-caching
    for more information.
    """

    anthropic_effort: Literal['low', 'medium', 'high', 'xhigh', 'max'] | None
    """The effort level for the model to use when generating a response.

    See [the Anthropic docs](https://docs.anthropic.com/en/docs/build-with-claude/effort) for more information.
    """

    anthropic_task_budget: AnthropicTaskBudget
    """Task budget configuration for Claude Opus 4.7 beta requests.

    Maps to `output_config.task_budget`. This setting is currently only supported on
    `claude-opus-4-7`, and Pydantic AI automatically enables Anthropic's required
    task-budget beta when it is present.

    Omit `remaining` unless you are intentionally carrying a budget across compaction
    or other rewritten context.
    """

    anthropic_container: BetaContainerParams | str | Literal[False]
    """Container configuration for multi-turn conversations.

    By default, if previous messages contain a container_id (from a prior response),
    it will be reused automatically.

    Set to `False` to force a fresh container (ignore any `container_id` from history).
    Set to a container id string (e.g. `'container_xxx'`) to explicitly reuse a container,
    or to a `BetaContainerParams` dict (e.g. `{'skills': [...]}` or
    `{'id': 'container_xxx', 'skills': [...]}`) when passing Skills to the Anthropic
    Skills beta.
    """

    anthropic_code_execution_tool_version: AnthropicCodeExecutionToolVersion | Literal['auto']
    """Which Anthropic code execution tool version to send for `CodeExecutionTool`.

    Defaults to `'auto'`, which uses the default version from the model profile:
    `'20260120'` for Sonnet 4.5+ and Opus 4.5+, otherwise `'20250825'`.
    Set a concrete version to force that tool version; a `UserError` is raised if
    the selected model profile does not support that version.
    """

    anthropic_eager_input_streaming: bool
    """Whether to enable eager input streaming on tool definitions.

    When enabled, all tool definitions will have `eager_input_streaming` set to `True`,
    allowing Anthropic to stream tool call arguments incrementally instead of buffering
    the entire JSON before streaming. This reduces latency for tool calls with large inputs.
    See https://platform.claude.com/docs/en/agents-and-tools/tool-use/fine-grained-tool-streaming for more information.
    """

    anthropic_betas: list[AnthropicBetaParam]
    """List of Anthropic beta features to enable for API requests.

    Each item can be a known beta name (e.g. 'interleaved-thinking-2025-05-14') or a custom string.
    Merged with auto-added betas (e.g. builtin tools) and any betas from
    extra_headers['anthropic-beta']. See the Anthropic docs for available beta features.
    """

    anthropic_speed: Literal['standard', 'fast']
    """The inference speed mode for this request.

    `'fast'` enables high output-tokens-per-second inference for supported models (currently Claude Opus 4.6 only).
    On unsupported models or clients, `anthropic_speed='fast'` is ignored with a `UserWarning`.
    Fast mode is a research preview and only available on the direct Anthropic API (not Bedrock, Vertex, or Foundry);
    see [the Anthropic docs](https://platform.claude.com/docs/en/build-with-claude/fast-mode) for details.
    Note: switching between `'fast'` and `'standard'` invalidates the prompt cache.
    """

    anthropic_context_management: BetaContextManagementConfigParam
    """Context management configuration for automatic compaction.

    When configured, Anthropic will automatically compact older context when the
    input token count exceeds the configured threshold. The compaction produces
    a summary that replaces the compacted messages.

    See [the Anthropic docs](https://docs.anthropic.com/en/docs/build-with-claude/compaction) for more details.
    """

anthropic_metadata `instance-attribute`

anthropic_metadata: BetaMetadataParam

An object describing metadata about the request.

Contains user_id, an external identifier for the user who is associated with the request.

anthropic_thinking `instance-attribute`

anthropic_thinking: BetaThinkingConfigParam

Determine whether the model should generate a thinking block.

See the Anthropic docs for more information.

anthropic_cache_tool_definitions `instance-attribute`

anthropic_cache_tool_definitions: bool | Literal["5m", "1h"]

Whether to add cache_control to the last tool definition.

When enabled, the last tool in the tools array will have cache_control set, allowing Anthropic to cache tool definitions and reduce costs. If True, uses TTL='5m'. You can also specify '5m' or '1h' directly. See https://docs.anthropic.com/en/docs/build-with-claude/prompt-caching for more information.

anthropic_service_tier `instance-attribute`

anthropic_service_tier: Literal['auto', 'standard_only']

The service tier to use for the model request.

See https://docs.anthropic.com/en/docs/build-with-claude/latency-and-throughput for more information.

anthropic_cache_instructions `instance-attribute`

anthropic_cache_instructions: bool | Literal['5m', '1h']

Whether to add cache_control to the last system prompt block.

When enabled, the last system prompt will have cache_control set, allowing Anthropic to cache system instructions and reduce costs. If True, uses TTL='5m'. You can also specify '5m' or '1h' directly. See https://docs.anthropic.com/en/docs/build-with-claude/prompt-caching for more information.

anthropic_cache_messages `instance-attribute`

anthropic_cache_messages: bool | Literal['5m', '1h']

Whether to add cache_control to the last message content block.

This is an alternative to anthropic_cache for Anthropic-compatible gateways and proxies that accept the Anthropic message format but don't support the top-level automatic caching parameter.

If True, uses TTL='5m'. You can also specify '5m' or '1h' directly. Cannot be combined with anthropic_cache.

anthropic_cache `instance-attribute`

anthropic_cache: bool | Literal['5m', '1h']

Enable prompt caching for multi-turn conversations.

Passes a top-level cache_control parameter so the server automatically applies a cache breakpoint to the last cacheable block and moves it forward as conversations grow.

On Bedrock and Vertex, automatic caching is not yet supported, so this falls back to per-block caching on the last user message. If the last content block already has cache_control from an explicit CachePoint, it is preserved.

If True, uses TTL='5m'. You can also specify '5m' or '1h' directly.

This can be combined with explicit cache breakpoints (anthropic_cache_instructions, anthropic_cache_tool_definitions, CachePoint). The automatic breakpoint counts as 1 of Anthropic's 4 cache point slots; we automatically trim excess explicit breakpoints. See https://docs.anthropic.com/en/docs/build-with-claude/prompt-caching#automatic-caching for more information.

anthropic_effort `instance-attribute`

anthropic_effort: (
    Literal["low", "medium", "high", "xhigh", "max"] | None
)

The effort level for the model to use when generating a response.

See the Anthropic docs for more information.

anthropic_task_budget `instance-attribute`

anthropic_task_budget: AnthropicTaskBudget

Task budget configuration for Claude Opus 4.7 beta requests.

Maps to output_config.task_budget. This setting is currently only supported on claude-opus-4-7, and Pydantic AI automatically enables Anthropic's required task-budget beta when it is present.

Omit remaining unless you are intentionally carrying a budget across compaction or other rewritten context.

anthropic_container `instance-attribute`

anthropic_container: (
    BetaContainerParams | str | Literal[False]
)

Container configuration for multi-turn conversations.

By default, if previous messages contain a container_id (from a prior response), it will be reused automatically.

Set to False to force a fresh container (ignore any container_id from history). Set to a container id string (e.g. 'container_xxx') to explicitly reuse a container, or to a BetaContainerParams dict (e.g. {'skills': [...]} or {'id': 'container_xxx', 'skills': [...]}) when passing Skills to the Anthropic Skills beta.

anthropic_code_execution_tool_version `instance-attribute`

anthropic_code_execution_tool_version: (
    AnthropicCodeExecutionToolVersion | Literal["auto"]
)

Which Anthropic code execution tool version to send for CodeExecutionTool.

Defaults to 'auto', which uses the default version from the model profile: '20260120' for Sonnet 4.5+ and Opus 4.5+, otherwise '20250825'. Set a concrete version to force that tool version; a UserError is raised if the selected model profile does not support that version.

anthropic_eager_input_streaming `instance-attribute`

anthropic_eager_input_streaming: bool

Whether to enable eager input streaming on tool definitions.

When enabled, all tool definitions will have eager_input_streaming set to True, allowing Anthropic to stream tool call arguments incrementally instead of buffering the entire JSON before streaming. This reduces latency for tool calls with large inputs. See https://platform.claude.com/docs/en/agents-and-tools/tool-use/fine-grained-tool-streaming for more information.

anthropic_betas `instance-attribute`

anthropic_betas: list[AnthropicBetaParam]

List of Anthropic beta features to enable for API requests.

Each item can be a known beta name (e.g. 'interleaved-thinking-2025-05-14') or a custom string. Merged with auto-added betas (e.g. builtin tools) and any betas from extra_headers['anthropic-beta']. See the Anthropic docs for available beta features.

anthropic_speed `instance-attribute`

anthropic_speed: Literal['standard', 'fast']

The inference speed mode for this request.

'fast' enables high output-tokens-per-second inference for supported models (currently Claude Opus 4.6 only). On unsupported models or clients, anthropic_speed='fast' is ignored with a UserWarning. Fast mode is a research preview and only available on the direct Anthropic API (not Bedrock, Vertex, or Foundry); see the Anthropic docs for details. Note: switching between 'fast' and 'standard' invalidates the prompt cache.

anthropic_context_management `instance-attribute`

anthropic_context_management: (
    BetaContextManagementConfigParam
)

Context management configuration for automatic compaction.

When configured, Anthropic will automatically compact older context when the input token count exceeds the configured threshold. The compaction produces a summary that replaces the compacted messages.

See the Anthropic docs for more details.

AnthropicModel `dataclass`

Bases: Model[AsyncAnthropicClient]

A model that uses the Anthropic API.

Internally, this uses the Anthropic Python client to interact with the API.

Apart from __init__, all methods are private or match those of the base class.

Source code in pydantic_ai_slim/pydantic_ai/models/anthropic.py

@dataclass(init=False)
class AnthropicModel(Model[AsyncAnthropicClient]):
    """A model that uses the Anthropic API.

    Internally, this uses the [Anthropic Python client](https://github.com/anthropics/anthropic-sdk-python) to interact with the API.

    Apart from `__init__`, all methods are private or match those of the base class.
    """

    _model_name: AnthropicModelName = field(repr=False)
    _provider: Provider[AsyncAnthropicClient] = field(repr=False)

    def __init__(
        self,
        model_name: AnthropicModelName,
        *,
        provider: Literal['anthropic', 'gateway'] | Provider[AsyncAnthropicClient] = 'anthropic',
        profile: ModelProfileSpec | None = None,
        settings: ModelSettings | None = None,
    ):
        """Initialize an Anthropic model.

        Args:
            model_name: The name of the Anthropic model to use. List of model names available
                [here](https://docs.anthropic.com/en/docs/about-claude/models).
            provider: The provider to use for the Anthropic API. Can be either the string 'anthropic' or an
                instance of `Provider[AsyncAnthropicClient]`. Defaults to 'anthropic'.
            profile: The model profile to use. Defaults to a profile picked by the provider based on the model name.
                The default 'anthropic' provider will use the default `..profiles.anthropic.anthropic_model_profile`.
            settings: Default model settings for this model instance.
        """
        self._model_name = model_name

        if isinstance(provider, str):
            provider = infer_provider('gateway/anthropic' if provider == 'gateway' else provider)
        self._provider = provider

        super().__init__(settings=settings, profile=profile)

    @property
    def client(self) -> AsyncAnthropicClient:
        return self._provider.client

    @property
    def base_url(self) -> str:
        return str(self.client.base_url)

    @property
    def model_name(self) -> AnthropicModelName:
        """The model name."""
        return self._model_name

    @property
    def system(self) -> str:
        """The model provider."""
        return self._provider.name

    @cached_property
    def profile(self) -> AnthropicModelProfile:
        return cast(AnthropicModelProfile, super().profile)

    @classmethod
    def supported_native_tools(cls) -> frozenset[type[AbstractNativeTool]]:
        """The set of builtin tool types this model can handle."""
        return frozenset({WebSearchTool, CodeExecutionTool, WebFetchTool, MemoryTool, MCPServerTool, ToolSearchTool})

    async def request(
        self,
        messages: list[ModelMessage],
        model_settings: ModelSettings | None,
        model_request_parameters: ModelRequestParameters,
    ) -> ModelResponse:
        check_allow_model_requests()
        model_settings, model_request_parameters = self.prepare_request(
            model_settings,
            model_request_parameters,
        )
        model_settings = cast(AnthropicModelSettings, model_settings or {})
        try:
            response = await self._messages_create(messages, False, model_settings, model_request_parameters)
            return self._process_response(response)
        except ValueError as e:
            if 'Streaming is required' in str(e):
                # Anthropic SDK requires streaming for high max_tokens; fall back transparently
                # https://github.com/anthropics/anthropic-sdk-python/blob/49d639a671cb0ac30c767e8e1e68fdd5925205d5/src/anthropic/_base_client.py#L726
                stream = await self._messages_create(messages, True, model_settings, model_request_parameters)
                async with stream:
                    streamed_response = await self._process_streamed_response(stream, model_request_parameters)
                    async for _ in streamed_response:
                        pass
                    return streamed_response.get()
            raise  # pragma: no cover

    async def count_tokens(
        self,
        messages: list[ModelMessage],
        model_settings: ModelSettings | None,
        model_request_parameters: ModelRequestParameters,
    ) -> usage.RequestUsage:
        model_settings, model_request_parameters = self.prepare_request(
            model_settings,
            model_request_parameters,
        )

        response = await self._messages_count_tokens(
            messages, cast(AnthropicModelSettings, model_settings or {}), model_request_parameters
        )

        return usage.RequestUsage(input_tokens=response.input_tokens)

    @asynccontextmanager
    async def request_stream(
        self,
        messages: list[ModelMessage],
        model_settings: ModelSettings | None,
        model_request_parameters: ModelRequestParameters,
        run_context: RunContext[Any] | None = None,
    ) -> AsyncIterator[StreamedResponse]:
        check_allow_model_requests()
        model_settings, model_request_parameters = self.prepare_request(
            model_settings,
            model_request_parameters,
        )
        response = await self._messages_create(
            messages, True, cast(AnthropicModelSettings, model_settings or {}), model_request_parameters
        )
        async with response:
            yield await self._process_streamed_response(response, model_request_parameters)

    def prepare_request(
        self, model_settings: ModelSettings | None, model_request_parameters: ModelRequestParameters
    ) -> tuple[ModelSettings | None, ModelRequestParameters]:
        profile = self.profile
        merged = merge_model_settings(self.settings, model_settings) or {}

        if (
            profile.get('anthropic_disallows_budget_thinking', False)
            and (anthropic_thinking := merged.get('anthropic_thinking'))
            and anthropic_thinking.get('type') == 'enabled'
        ):
            raise UserError(
                f'{self.model_name!r} does not support '
                "`anthropic_thinking={'type': 'enabled', 'budget_tokens': ...}`. "
                "Use `anthropic_thinking={'type': 'adaptive'}` and `anthropic_effort=...` instead."
            )

        thinking_enabled = False
        if anthropic_thinking := merged.get('anthropic_thinking'):
            thinking_enabled = anthropic_thinking.get('type') in ('enabled', 'adaptive')
        elif merged.get('thinking'):
            thinking_enabled = True

        if model_request_parameters.output_tools and thinking_enabled:
            output_mode = 'native' if self.profile.get('supports_json_schema_output', False) else 'prompted'
            model_request_parameters = model_request_parameters.with_default_output_mode(output_mode)
            if (
                model_request_parameters.output_mode == 'tool' and not model_request_parameters.allow_text_output
            ):  # pragma: no branch
                # This would result in `tool_choice=required`, which Anthropic does not support with thinking.
                suggested_output_type = (
                    'NativeOutput' if self.profile.get('supports_json_schema_output', False) else 'PromptedOutput'
                )
                raise UserError(
                    f'Anthropic does not support thinking and output tools at the same time. Use `output_type={suggested_output_type}(...)` instead.'
                )

        if model_request_parameters.output_mode == 'native':
            assert model_request_parameters.output_object is not None
            if model_request_parameters.output_object.strict is False:
                raise UserError(
                    'Setting `strict=False` on `output_type=NativeOutput(...)` is not allowed for Anthropic models.'
                )
            model_request_parameters = replace(
                model_request_parameters, output_object=replace(model_request_parameters.output_object, strict=True)
            )

        prepared_settings, model_request_parameters = super().prepare_request(model_settings, model_request_parameters)
        if profile.get('anthropic_disallows_sampling_settings', False) and prepared_settings:
            filtered: ModelSettings = {**prepared_settings}
            self._drop_unsupported_sampling_settings(filtered)
            prepared_settings = filtered or None
        return prepared_settings, model_request_parameters

    def _drop_unsupported_sampling_settings(self, model_settings: ModelSettings) -> None:
        dropped = {setting for setting in _ANTHROPIC_SAMPLING_PARAMS if setting in model_settings}
        extra_body = model_settings.get('extra_body')
        if is_str_dict(extra_body):
            dropped |= {setting for setting in _ANTHROPIC_SAMPLING_PARAMS if setting in extra_body}
            model_settings['extra_body'] = {
                key: value for key, value in extra_body.items() if key not in _ANTHROPIC_SAMPLING_PARAMS
            }

        for setting in _ANTHROPIC_SAMPLING_PARAMS:
            model_settings.pop(setting, None)

        if dropped:
            ordered = [setting for setting in _ANTHROPIC_SAMPLING_PARAMS if setting in dropped]
            warnings.warn(
                f'Sampling parameters {ordered} are not supported by {self.model_name!r}. These settings will be ignored.',
                UserWarning,
                stacklevel=2,
            )

    def _translate_thinking(
        self,
        model_settings: AnthropicModelSettings,
        model_request_parameters: ModelRequestParameters,
    ) -> BetaThinkingConfigParam:
        """Get the thinking parameter, falling back to unified thinking."""
        if anthropic_thinking := model_settings.get('anthropic_thinking'):
            return anthropic_thinking
        thinking = model_request_parameters.thinking
        if thinking is None or thinking is False:
            return OMIT  # type: ignore[return-value]
        profile = self.profile
        if profile.get('anthropic_supports_adaptive_thinking', False):
            return {'type': 'adaptive'}
        return {'type': 'enabled', 'budget_tokens': ANTHROPIC_THINKING_BUDGET_MAP[thinking]}

    @overload
    async def _messages_create(
        self,
        messages: list[ModelMessage],
        stream: Literal[True],
        model_settings: AnthropicModelSettings,
        model_request_parameters: ModelRequestParameters,
    ) -> AsyncStream[BetaRawMessageStreamEvent]:
        pass

    @overload
    async def _messages_create(
        self,
        messages: list[ModelMessage],
        stream: Literal[False],
        model_settings: AnthropicModelSettings,
        model_request_parameters: ModelRequestParameters,
    ) -> BetaMessage:
        pass

    async def _messages_create(
        self,
        messages: list[ModelMessage],
        stream: bool,
        model_settings: AnthropicModelSettings,
        model_request_parameters: ModelRequestParameters,
    ) -> BetaMessage | AsyncStream[BetaRawMessageStreamEvent]:
        """Calls the Anthropic API to create a message.

        This is the last step before sending the request to the API.
        Most preprocessing has happened in `prepare_request()`.
        """
        tools, tool_choice = self._prepare_tools_and_tool_choice(model_settings, model_request_parameters)
        tools, mcp_servers, native_tool_betas = self._add_native_tools(tools, model_request_parameters, model_settings)

        auto_cache_control, resolved_cache_ttl = self._build_automatic_cache_control(model_settings)
        system_prompt, anthropic_messages = await self._map_message(messages, model_request_parameters, model_settings)
        self._apply_per_block_caching_fallback(resolved_cache_ttl, anthropic_messages)
        self._apply_explicit_message_caching(model_settings, anthropic_messages)
        self._limit_cache_points(
            system_prompt, anthropic_messages, tools, automatic_caching=auto_cache_control is not None
        )
        output_config = self._build_output_config(model_request_parameters, model_settings)
        anthropic_profile = self.profile
        betas, extra_headers = self._get_betas_and_extra_headers(model_settings, anthropic_profile)
        betas.update(native_tool_betas)
        context_management = self._add_compaction_params(messages, betas, model_settings)
        self._validate_task_budget_vs_context_management(model_settings, context_management)
        container = self._get_container(messages, model_settings)

        with _map_api_errors(self.model_name):
            return await self.client.beta.messages.create(
                max_tokens=model_settings.get('max_tokens', 4096),
                system=system_prompt or OMIT,
                messages=anthropic_messages,
                model=self._model_name,
                tools=tools or OMIT,
                tool_choice=tool_choice or OMIT,
                mcp_servers=mcp_servers or OMIT,
                output_config=output_config or OMIT,
                betas=sorted(betas) or OMIT,
                stream=stream,
                cache_control=auto_cache_control or OMIT,
                thinking=self._translate_thinking(model_settings, model_request_parameters),
                stop_sequences=model_settings.get('stop_sequences', OMIT),
                temperature=model_settings.get('temperature', OMIT),
                top_p=model_settings.get('top_p', OMIT),
                timeout=model_settings.get('timeout', NOT_GIVEN),
                metadata=model_settings.get('anthropic_metadata', OMIT),
                context_management=context_management or OMIT,
                container=container or OMIT,
                service_tier=_resolve_anthropic_service_tier(model_settings),
                speed=self._effective_speed(model_settings, anthropic_profile),
                extra_headers=extra_headers,
                extra_body=model_settings.get('extra_body'),
            )

    @staticmethod
    def _add_compaction_params(
        messages: list[ModelMessage],
        betas: set[str],
        model_settings: AnthropicModelSettings,
    ) -> BetaContextManagementConfigParam | None:
        """Add compaction beta and default context_management when messages contain CompactionParts.

        This ensures CompactionParts can be round-tripped even without AnthropicCompaction active.
        """
        has_compaction_parts = any(
            isinstance(part, CompactionPart) for msg in messages if isinstance(msg, ModelResponse) for part in msg.parts
        )
        if has_compaction_parts:
            betas.add('compact-2026-01-12')
        context_management = model_settings.get('anthropic_context_management')
        if has_compaction_parts and context_management is None:
            context_management = cast(
                BetaContextManagementConfigParam, {'edits': [{'type': _ANTHROPIC_COMPACT_EDIT_TYPE}]}
            )
        return context_management

    def _get_betas_and_extra_headers(
        self,
        model_settings: AnthropicModelSettings,
        anthropic_profile: AnthropicModelProfile,
    ) -> tuple[set[str], dict[str, str]]:
        """Prepare beta features list and extra headers for API request.

        Handles merging custom `anthropic-beta` header from `extra_headers` into betas set
        and ensuring `User-Agent` is set.
        """
        extra_headers = dict(model_settings.get('extra_headers', {}))
        extra_headers.setdefault('User-Agent', get_user_agent())

        betas: set[str] = set()

        if model_settings.get('anthropic_context_management'):
            betas.add('compact-2026-01-12')
        if self._get_task_budget(model_settings) is not None:
            betas.add(_ANTHROPIC_TASK_BUDGETS_BETA)

        if model_settings.get('anthropic_speed') == 'fast' and self._client_supports_fast_speed(anthropic_profile):
            betas.add('fast-mode-2026-02-01')

        if betas_from_setting := model_settings.get('anthropic_betas'):
            betas.update(str(b) for b in betas_from_setting)

        if beta_header := extra_headers.pop('anthropic-beta', None):
            betas.update({stripped_beta for beta in beta_header.split(',') if (stripped_beta := beta.strip())})

        return betas, extra_headers

    def _effective_speed(
        self, model_settings: AnthropicModelSettings, anthropic_profile: AnthropicModelProfile
    ) -> Literal['standard', 'fast'] | Omit:
        """Speed to send to the API, or OMIT when the model or client does not support the `speed` parameter."""
        s = model_settings.get('anthropic_speed')
        if s in ('standard', 'fast') and self._client_supports_fast_speed(anthropic_profile):
            return s
        if s == 'fast':
            warnings.warn(
                f"anthropic_speed='fast' is not supported by {self.model_name} on this client; the setting will be ignored.",
                UserWarning,
                stacklevel=2,
            )
        return OMIT

    def _client_supports_fast_speed(self, anthropic_profile: AnthropicModelProfile) -> bool:
        """Fast mode is only available on the direct Anthropic API (not Bedrock, Vertex, or Foundry)."""
        return anthropic_profile.get('anthropic_supports_fast_speed', False) and not isinstance(
            self.client, _FAST_MODE_UNSUPPORTED_CLIENTS
        )

    def _get_container(
        self, messages: list[ModelMessage], model_settings: AnthropicModelSettings
    ) -> BetaContainerParams | str | None:
        """Resolve the `container` request parameter.

        The Anthropic SDK types `container` as `BetaContainerParams | str`, and the
        live API accepts both forms *except* for one specific shape: a dict carrying
        only `id` and nothing else, which is rejected with
        `container: Input should be a valid string`. `{"skills": [...]}`,
        `{"id": x, "skills": [...]}`, and the raw `"x"` string all work — only
        `{"id": x}` alone is broken server-side.

        So when the user passes that only-broken shape, we transparently unwrap it to
        the string the server wants. Every other shape is passed through untouched so
        the Skills path (`{"skills": ...}` / `{"id": ..., "skills": ...}`) keeps
        working. History-based reuse is always sent as the raw id string since we
        never have skills to attach there.
        """
        if (container := model_settings.get('anthropic_container')) is not None:
            if container is False:
                return None
            if isinstance(container, dict) and set(container) == {'id'} and (cid := container.get('id')):
                return cid
            return container
        for m in reversed(messages):
            if isinstance(m, ModelResponse) and m.provider_name == self.system and m.provider_details:
                if cid := m.provider_details.get('container_id'):
                    return cid
        return None

    async def _messages_count_tokens(
        self,
        messages: list[ModelMessage],
        model_settings: AnthropicModelSettings,
        model_request_parameters: ModelRequestParameters,
    ) -> BetaMessageTokensCount:
        if isinstance(self.client, AsyncAnthropicBedrock):
            raise UserError('AsyncAnthropicBedrock client does not support `count_tokens` api.')

        # standalone function to make it easier to override
        tools, tool_choice = self._prepare_tools_and_tool_choice(model_settings, model_request_parameters)
        tools, mcp_servers, native_tool_betas = self._add_native_tools(tools, model_request_parameters, model_settings)

        auto_cache_control, resolved_cache_ttl = self._build_automatic_cache_control(model_settings)
        system_prompt, anthropic_messages = await self._map_message(messages, model_request_parameters, model_settings)
        self._apply_per_block_caching_fallback(resolved_cache_ttl, anthropic_messages)
        self._apply_explicit_message_caching(model_settings, anthropic_messages)
        self._limit_cache_points(
            system_prompt, anthropic_messages, tools, automatic_caching=auto_cache_control is not None
        )
        output_config = self._build_output_config(model_request_parameters, model_settings)
        anthropic_profile = self.profile
        betas, extra_headers = self._get_betas_and_extra_headers(model_settings, anthropic_profile)
        betas.update(native_tool_betas)
        context_management = self._add_compaction_params(messages, betas, model_settings)
        self._validate_task_budget_vs_context_management(model_settings, context_management)
        with _map_api_errors(self.model_name):
            return await self.client.beta.messages.count_tokens(
                system=system_prompt or OMIT,
                messages=anthropic_messages,
                model=self._model_name,
                tools=tools or OMIT,
                tool_choice=tool_choice or OMIT,
                mcp_servers=mcp_servers or OMIT,
                betas=sorted(betas) or OMIT,
                output_config=output_config or OMIT,
                cache_control=auto_cache_control or OMIT,
                thinking=self._translate_thinking(model_settings, model_request_parameters),
                context_management=context_management or OMIT,
                timeout=model_settings.get('timeout', NOT_GIVEN),
                speed=self._effective_speed(model_settings, anthropic_profile),
                extra_headers=extra_headers,
                extra_body=model_settings.get('extra_body'),
            )

    def _process_response(self, response: BetaMessage) -> ModelResponse:  # noqa: C901
        """Process a non-streamed response, and prepare a message to return."""
        items: list[ModelResponsePart] = []
        builtin_tool_calls: dict[str, NativeToolCallPart] = {}
        for item in response.content:
            if isinstance(item, BetaTextBlock):
                items.append(TextPart(content=item.text))
            elif isinstance(item, BetaServerToolUseBlock):
                call_part = _map_server_tool_use_block(item, self.system)
                builtin_tool_calls[call_part.tool_call_id] = call_part
                items.append(call_part)
            elif isinstance(item, BetaWebSearchToolResultBlock):
                items.append(_map_web_search_tool_result_block(item, self.system))
            elif isinstance(item, BetaToolSearchToolResultBlock):
                items.append(_map_tool_search_tool_result_block(item, self.system))
            elif isinstance(item, BetaCodeExecutionToolResultBlock):
                items.append(_map_code_execution_tool_result_block(item, self.system))
            elif isinstance(item, BetaBashCodeExecutionToolResultBlock):
                items.append(_map_bash_code_execution_tool_result_block(item, self.system))
            elif isinstance(item, BetaTextEditorCodeExecutionToolResultBlock):
                items.append(_map_text_editor_code_execution_tool_result_block(item, self.system))
            elif isinstance(item, BetaWebFetchToolResultBlock):
                items.append(_map_web_fetch_tool_result_block(item, self.system))
            elif isinstance(item, BetaRedactedThinkingBlock):
                items.append(
                    ThinkingPart(id='redacted_thinking', content='', signature=item.data, provider_name=self.system)
                )
            elif isinstance(item, BetaThinkingBlock):
                items.append(ThinkingPart(content=item.thinking, signature=item.signature, provider_name=self.system))
            elif isinstance(item, BetaMCPToolUseBlock):
                call_part = _map_mcp_server_use_block(item, self.system)
                builtin_tool_calls[call_part.tool_call_id] = call_part
                items.append(call_part)
            elif isinstance(item, BetaMCPToolResultBlock):
                call_part = builtin_tool_calls.get(item.tool_use_id)
                items.append(_map_mcp_server_result_block(item, call_part, self.system))
            elif isinstance(item, BetaCompactionBlock):
                items.append(CompactionPart(content=item.content, provider_name=self.system))
            else:
                assert isinstance(item, BetaToolUseBlock), f'unexpected item type {type(item)}'
                items.append(
                    ToolCallPart(
                        tool_name=item.name,
                        args=cast(dict[str, Any], item.input),
                        tool_call_id=item.id,
                    )
                )

        finish_reason: FinishReason | None = None
        provider_details: dict[str, Any] | None = None
        if raw_finish_reason := response.stop_reason:  # pragma: no branch
            provider_details = {'finish_reason': raw_finish_reason}
            finish_reason = _FINISH_REASON_MAP.get(raw_finish_reason)
        if response.container:
            provider_details = provider_details or {}
            provider_details['container_id'] = response.container.id

        return ModelResponse(
            parts=items,
            usage=_map_usage(response, self._provider.name, self._provider.base_url, self._model_name),
            model_name=response.model,
            provider_response_id=response.id,
            provider_name=self._provider.name,
            provider_url=self._provider.base_url,
            finish_reason=finish_reason,
            provider_details=provider_details,
        )

    async def _process_streamed_response(
        self, response: AsyncStream[BetaRawMessageStreamEvent], model_request_parameters: ModelRequestParameters
    ) -> StreamedResponse:
        peekable_response: _utils.PeekableAsyncStream[
            BetaRawMessageStreamEvent, AsyncStream[BetaRawMessageStreamEvent]
        ] = _utils.PeekableAsyncStream(response)
        with _map_api_errors(self.model_name):
            first_chunk = await peekable_response.peek()
        if isinstance(first_chunk, _utils.Unset):
            raise UnexpectedModelBehavior('Streamed response ended without content or tool calls')  # pragma: no cover

        assert isinstance(first_chunk, BetaRawMessageStartEvent)

        return AnthropicStreamedResponse(
            model_request_parameters=model_request_parameters,
            _model_name=first_chunk.message.model,
            _response=peekable_response,
            _provider_name=self._provider.name,
            _provider_url=self._provider.base_url,
        )

    def _get_code_execution_tool_version(
        self, model_settings: AnthropicModelSettings
    ) -> AnthropicCodeExecutionToolVersion:
        version = model_settings.get('anthropic_code_execution_tool_version', 'auto')
        profile = self.profile
        if version == 'auto':
            return profile.get('anthropic_default_code_execution_tool_version', '20250825')
        if version not in profile.get('anthropic_supported_code_execution_tool_versions', ('20250825',)):
            supported_versions = ', '.join(
                f'{supported_version!r}'
                for supported_version in profile.get('anthropic_supported_code_execution_tool_versions', ('20250825',))
            )
            raise UserError(
                f'`anthropic_code_execution_tool_version={version!r}` is not supported by model '
                f'{self.model_name!r}. Supported versions are: {supported_versions}.'
            )
        return version

    def _resolve_tool_search_strategy(self, strategy: Literal['bm25', 'regex'] | None) -> Literal['bm25', 'regex']:
        """Resolve which native tool-search variant to send.

        `bm25` is the default, except on clients that don't support it (the legacy Bedrock
        InvokeModel API), where `regex` is the default and an explicit `bm25` is rejected.
        """
        if isinstance(self.client, _BM25_TOOL_SEARCH_UNSUPPORTED_CLIENTS):
            if strategy == 'bm25':
                raise UserError(
                    "ToolSearch(strategy='bm25') is not supported by the `AsyncAnthropicBedrock` client; "
                    "use ToolSearch(strategy='regex') instead, or leave the strategy unset to use the default."
                )
            return 'regex'
        return strategy or 'bm25'

    def _add_native_tools(
        self,
        tools: list[BetaToolUnionParam],
        model_request_parameters: ModelRequestParameters,
        model_settings: AnthropicModelSettings,
    ) -> tuple[list[BetaToolUnionParam], list[BetaRequestMCPServerURLDefinitionParam], set[str]]:
        beta_features: set[str] = set()
        mcp_servers: list[BetaRequestMCPServerURLDefinitionParam] = []
        for tool in model_request_parameters.native_tools:
            if isinstance(tool, WebSearchTool):
                user_location = (
                    BetaUserLocationParam(type='approximate', **tool.user_location) if tool.user_location else None
                )
                tools.append(
                    BetaWebSearchTool20250305Param(
                        name='web_search',
                        type='web_search_20250305',
                        max_uses=tool.max_uses,
                        allowed_domains=tool.allowed_domains,
                        blocked_domains=tool.blocked_domains,
                        user_location=user_location,
                    )
                )
            elif isinstance(tool, CodeExecutionTool):  # pragma: no branch
                tool_version = self._get_code_execution_tool_version(model_settings)
                tools.append(_map_code_execution_tool(tool_version))
            elif isinstance(tool, WebFetchTool):  # pragma: no branch
                citations = BetaCitationsConfigParam(enabled=tool.enable_citations) if tool.enable_citations else None
                tools.append(
                    BetaWebFetchTool20250910Param(
                        name='web_fetch',
                        type='web_fetch_20250910',
                        max_uses=tool.max_uses,
                        allowed_domains=tool.allowed_domains,
                        blocked_domains=tool.blocked_domains,
                        citations=citations,
                        max_content_tokens=tool.max_content_tokens,
                    )
                )
                beta_features.add('web-fetch-2025-09-10')
            elif isinstance(tool, MemoryTool):  # pragma: no branch
                if 'memory' not in model_request_parameters.tool_defs:
                    raise UserError("Native `MemoryTool` requires a 'memory' tool to be defined.")
                # Replace the memory tool definition with the native memory tool
                tools = [tool for tool in tools if tool.get('name') != 'memory']
                tools.append(BetaMemoryTool20250818Param(name='memory', type='memory_20250818'))
                beta_features.add('context-management-2025-06-27')
            elif isinstance(tool, ToolSearchTool):  # pragma: no branch
                # Custom-callable strategies go through the regular `search_tools` function tool
                # (which is already in `function_tools`), so no server-side builtin is emitted.
                if tool.strategy != 'custom':
                    if self._resolve_tool_search_strategy(tool.strategy) == 'regex':
                        tools.append(
                            BetaToolSearchToolRegex20251119Param(
                                type='tool_search_tool_regex_20251119',
                                name='tool_search_tool_regex',
                            )
                        )
                    else:
                        tools.append(
                            BetaToolSearchToolBm25_20251119Param(
                                type='tool_search_tool_bm25_20251119',
                                name='tool_search_tool_bm25',
                            )
                        )
                # No `beta_features.add(...)`: tool search is GA on Sonnet/Opus/Haiku 4.5+ and the
                # provisional `tool-search-tool-2025-11-19` beta header is rejected by the API.
            elif isinstance(tool, MCPServerTool) and tool.url:
                mcp_server_url_definition_param = BetaRequestMCPServerURLDefinitionParam(
                    type='url',
                    name=tool.id,
                    url=tool.url,
                )
                if tool.allowed_tools is not None:  # pragma: no branch
                    mcp_server_url_definition_param['tool_configuration'] = BetaRequestMCPServerToolConfigurationParam(
                        enabled=bool(tool.allowed_tools),
                        allowed_tools=tool.allowed_tools,
                    )
                if tool.authorization_token:  # pragma: no cover
                    mcp_server_url_definition_param['authorization_token'] = tool.authorization_token
                mcp_servers.append(mcp_server_url_definition_param)
                beta_features.add('mcp-client-2025-04-04')
            else:
                raise UserError(  # pragma: no cover
                    f'`{tool.__class__.__name__}` is not supported by `AnthropicModel`. If it should be, please file an issue.'
                )
        return tools, mcp_servers, beta_features

    def _prepare_tools_and_tool_choice(
        self,
        model_settings: AnthropicModelSettings,
        model_request_parameters: ModelRequestParameters,
    ) -> tuple[list[BetaToolUnionParam], BetaToolChoiceParam | None]:
        """Determine which tools to send and the API tool_choice value.

        Returns:
            A tuple of (filtered_tools, tool_choice).
        """
        tool_defs = model_request_parameters.tool_defs

        resolved_tool_choice = resolve_tool_choice(model_settings, model_request_parameters)

        tool_choice: BetaToolChoiceParam

        if resolved_tool_choice in ('auto', 'none'):
            # tool_choice = {'type': resolved_tool_choice}`: pyright can't narrow this properly
            tool_choice = {'type': 'auto'} if resolved_tool_choice == 'auto' else {'type': 'none'}
        elif resolved_tool_choice == 'required':
            supports = _support_tool_forcing(
                model_settings, model_request_parameters, resolved_tool_choice, "tool_choice='required'"
            )
            tool_choice = {'type': 'any'} if supports else {'type': 'auto'}
        elif isinstance(resolved_tool_choice, tuple):
            tool_choice_mode, tool_names = resolved_tool_choice
            supports = _support_tool_forcing(model_settings, model_request_parameters, resolved_tool_choice)
            if tool_choice_mode == 'required' and len(tool_names) == 1:
                if supports:
                    tool_choice = {'type': 'tool', 'name': next(iter(tool_names))}
                else:
                    # Forcing not supported (e.g. thinking enabled): filter so the model can only
                    # see the requested tool, since `auto` alone wouldn't restrict the choice.
                    # Breaks caching, but Anthropic doesn't support limiting tools via API arg.
                    tool_defs = {k: v for k, v in tool_defs.items() if k in tool_names}
                    tool_choice = {'type': 'auto'}
            else:
                # Breaks caching, but Anthropic doesn't support limiting tools via API arg
                tool_defs = {k: v for k, v in tool_defs.items() if k in tool_names}
                tool_choice = {'type': 'auto'} if tool_choice_mode == 'auto' or not supports else {'type': 'any'}
        else:
            assert_never(resolved_tool_choice)

        if not tool_defs:
            return [], None

        # Map ToolDefinitions to Anthropic format
        tools: list[BetaToolUnionParam] = [self._map_tool_definition(t, model_settings) for t in tool_defs.values()]

        # Add cache_control to the last non-deferred tool if enabled. Anthropic rejects
        # `cache_control` on tools with `defer_loading=True` (`Tools with defer_loading
        # cannot use prompt caching`); they're hidden from the model until tool search
        # discovers them, so they aren't part of the cacheable prompt prefix anyway.
        if cache_tool_defs := model_settings.get('anthropic_cache_tool_definitions'):
            ttl: Literal['5m', '1h'] = '5m' if cache_tool_defs is True else cache_tool_defs
            for tool in reversed(tools):
                if tool.get('defer_loading') is not True:
                    tool['cache_control'] = self._build_cache_control(ttl)
                    break

        if 'parallel_tool_calls' in model_settings and tool_choice['type'] != 'none':
            tool_choice['disable_parallel_tool_use'] = not model_settings['parallel_tool_calls']

        return tools, tool_choice

    async def _map_message(  # noqa: C901
        self,
        messages: list[ModelMessage],
        model_request_parameters: ModelRequestParameters,
        model_settings: AnthropicModelSettings,
    ) -> tuple[str | list[BetaTextBlockParam], list[BetaMessageParam]]:
        """Just maps a `pydantic_ai.Message` to a `anthropic.types.MessageParam`."""
        system_prompt_parts: list[str] = []
        anthropic_messages: list[BetaMessageParam] = []
        # Whenever the current request carries any `ToolSearchTool` builtin, render any
        # local-shape `search_tools` history exchanges as Anthropic's "client-side"
        # tool-search wire — `tool_use` paired with a `tool_result` whose `content` is a
        # `tool_reference` array. This unlocks the deferred tools' schemas server-side
        # without forcing the model to re-search, and works regardless of the current
        # turn's strategy (default native, named native, or custom callable). The flag
        # also covers the no-history single-turn custom callable case (where the local
        # `search_tools` exchange is created on this turn).
        tool_search_active = any(isinstance(t, ToolSearchTool) for t in model_request_parameters.native_tools)
        # Filter `tool_reference` entries during replay against the tools the current turn
        # will actually send: Anthropic rejects references to tools not in the wire `tools`
        # list (e.g. an MCP that failed to register this turn). The previously-discovered
        # name still lives in history; it just isn't worth replaying as a tool reference.
        available_tool_names = {t.name for t in model_request_parameters.function_tools}
        orphan_tool_search_call_ids = _collect_orphan_tool_search_call_ids(messages)
        for m in messages:
            if isinstance(m, ModelRequest):
                user_content_params: list[BetaContentBlockParam] = []
                for request_part in m.parts:
                    if isinstance(request_part, SystemPromptPart):
                        system_prompt_parts.append(request_part.content)
                    elif isinstance(request_part, UserPromptPart):
                        async for content in self._map_user_prompt(request_part):
                            if isinstance(content, CachePoint):
                                self._add_cache_control_to_last_param(user_content_params, ttl=content.ttl)
                            else:
                                user_content_params.append(content)
                    elif isinstance(request_part, ToolReturnPart):
                        tool_result_content: list[beta_tool_result_block_param.Content] = []

                        custom_tool_refs, custom_empty_message = _build_custom_tool_search_replay_blocks(
                            request_part, tool_search_active, available_tool_names
                        )
                        if custom_tool_refs:
                            tool_result_block_param = beta_tool_result_block_param.BetaToolResultBlockParam(
                                tool_use_id=_guard_tool_call_id(t=request_part),
                                type='tool_result',
                                content=custom_tool_refs,
                                is_error=False,
                            )
                            user_content_params.append(tool_result_block_param)
                            continue
                        if custom_tool_refs is not None:
                            # Empty-results path on the custom-callable strategy. Anthropic
                            # rejects an empty `tool_result.content` list, so we send the
                            # `message` text from the typed return (set by the toolset's
                            # `_empty_return`) as a single text block instead.
                            empty_message = custom_empty_message or _NO_MATCHES_MESSAGE
                            tool_result_block_param = beta_tool_result_block_param.BetaToolResultBlockParam(
                                tool_use_id=_guard_tool_call_id(t=request_part),
                                type='tool_result',
                                content=[BetaTextBlockParam(text=empty_message, type='text')],
                                is_error=False,
                            )
                            user_content_params.append(tool_result_block_param)
                            continue

                        for item in request_part.content_items(mode='str'):
                            if isinstance(item, UploadedFile):
                                if item.provider_name != self.system:
                                    raise UserError(
                                        f'UploadedFile with `provider_name={item.provider_name!r}` cannot be used with AnthropicModel. '
                                        f'Expected `provider_name` to be `{self.system!r}`.'
                                    )
                                if item.media_type.startswith('image/'):
                                    tool_result_content.append(
                                        BetaImageBlockParam(
                                            source=BetaFileImageSourceParam(file_id=item.file_id, type='file'),
                                            type='image',
                                        )
                                    )
                                elif item.media_type.startswith(('text/', 'application/')):
                                    tool_result_content.append(
                                        BetaRequestDocumentBlockParam(
                                            source=BetaFileDocumentSourceParam(file_id=item.file_id, type='file'),
                                            type='document',
                                        )
                                    )
                                else:
                                    raise UserError(
                                        f'Unsupported media type {item.media_type!r} for Anthropic file upload. '
                                        'Only image and document (text/application) types are supported.'
                                    )
                            elif is_multi_modal_content(item):
                                tool_result_content.append(await self._map_file_to_content_block(item, 'tool returns'))  # pyright: ignore[reportArgumentType]
                            elif isinstance(item, str):  # pragma: no branch
                                tool_result_content.append(BetaTextBlockParam(text=item, type='text'))

                        tool_result_block_param = beta_tool_result_block_param.BetaToolResultBlockParam(
                            tool_use_id=_guard_tool_call_id(t=request_part),
                            type='tool_result',
                            content=tool_result_content or '',
                            is_error=False,
                        )
                        user_content_params.append(tool_result_block_param)
                    elif isinstance(request_part, RetryPromptPart):  # pragma: no branch
                        if request_part.tool_name is None:
                            text = request_part.model_response()  # pragma: no cover
                            retry_param = BetaTextBlockParam(type='text', text=text)  # pragma: no cover
                        else:
                            retry_param = beta_tool_result_block_param.BetaToolResultBlockParam(
                                tool_use_id=_guard_tool_call_id(t=request_part),
                                type='tool_result',
                                content=request_part.model_response(),
                                is_error=True,
                            )
                        user_content_params.append(retry_param)
                if len(user_content_params) > 0:
                    anthropic_messages.append(BetaMessageParam(role='user', content=user_content_params))
            elif isinstance(m, ModelResponse):
                assistant_content_params: list[
                    BetaTextBlockParam
                    | BetaToolUseBlockParam
                    | BetaServerToolUseBlockParam
                    | BetaWebSearchToolResultBlockParam
                    | BetaCodeExecutionToolResultBlockParam
                    | BetaBashCodeExecutionToolResultBlockParam
                    | BetaTextEditorCodeExecutionToolResultBlockParam
                    | BetaWebFetchToolResultBlockParam
                    | BetaToolSearchToolResultBlockParam
                    | BetaThinkingBlockParam
                    | BetaRedactedThinkingBlockParam
                    | BetaMCPToolUseBlockParam
                    | BetaMCPToolResultBlock
                    | BetaCompactionBlockParam
                ] = []
                for response_part in m.parts:
                    if isinstance(response_part, TextPart):
                        if response_part.content:
                            assistant_content_params.append(BetaTextBlockParam(text=response_part.content, type='text'))
                    elif isinstance(response_part, ToolCallPart):
                        tool_use_block_param = BetaToolUseBlockParam(
                            id=_guard_tool_call_id(t=response_part),
                            type='tool_use',
                            name=response_part.tool_name,
                            input=response_part.args_as_dict(),
                        )
                        assistant_content_params.append(tool_use_block_param)
                    elif isinstance(response_part, ThinkingPart):
                        if (
                            response_part.provider_name == self.system and response_part.signature is not None
                        ):  # pragma: no branch
                            if response_part.id == 'redacted_thinking':
                                assistant_content_params.append(
                                    BetaRedactedThinkingBlockParam(
                                        data=response_part.signature,
                                        type='redacted_thinking',
                                    )
                                )
                            else:
                                assistant_content_params.append(
                                    BetaThinkingBlockParam(
                                        thinking=response_part.content,
                                        signature=response_part.signature,
                                        type='thinking',
                                    )
                                )
                        elif response_part.content:  # pragma: no branch
                            start_tag, end_tag = self.profile.get('thinking_tags', DEFAULT_THINKING_TAGS)
                            assistant_content_params.append(
                                BetaTextBlockParam(
                                    text='\n'.join([start_tag, response_part.content, end_tag]), type='text'
                                )
                            )
                    elif isinstance(response_part, NativeToolCallPart):
                        if response_part.provider_name == self.system:
                            tool_use_id = _guard_tool_call_id(t=response_part)
                            if response_part.tool_name == WebSearchTool.kind:
                                server_tool_use_block_param = BetaServerToolUseBlockParam(
                                    id=tool_use_id,
                                    type='server_tool_use',
                                    name='web_search',
                                    input=response_part.args_as_dict(),
                                )
                                assistant_content_params.append(server_tool_use_block_param)
                            elif response_part.tool_name in (
                                CodeExecutionTool.kind,
                                'bash_code_execution',
                                'text_editor_code_execution',
                            ):
                                anthropic_tool_name = _get_anthropic_code_execution_tool_name(response_part)
                                server_tool_use_block_param = BetaServerToolUseBlockParam(
                                    id=tool_use_id,
                                    type='server_tool_use',
                                    name=anthropic_tool_name,
                                    input=response_part.args_as_dict(),
                                )
                                assistant_content_params.append(server_tool_use_block_param)
                            elif response_part.tool_name == WebFetchTool.kind:
                                server_tool_use_block_param = BetaServerToolUseBlockParam(
                                    id=tool_use_id,
                                    type='server_tool_use',
                                    name='web_fetch',
                                    input=response_part.args_as_dict(),
                                )
                                assistant_content_params.append(server_tool_use_block_param)
                            elif response_part.tool_name == ToolSearchTool.kind:
                                if tool_use_id in orphan_tool_search_call_ids:
                                    # Anthropic occasionally emits a `tool_search_tool_*` server tool use
                                    # in parallel with a client `tool_use` and ends the turn before
                                    # delivering the corresponding `tool_search_tool_*_tool_result` block
                                    # (see anthropics/anthropic-sdk-python#1325). Direct API tolerates
                                    # the unpaired call on resend (the result arrives in the next turn),
                                    # but Bedrock 400s with `tool use ... was found without a corresponding
                                    # tool_search_tool_*_tool_result block`. Drop the orphaned call from the
                                    # wire payload — the model will re-search if it still wants to. We don't
                                    # synthesize an empty result block because that would falsely tell the
                                    # model the search ran and returned nothing.
                                    continue
                                # Round-trip the native variant (bm25/regex) so we don't
                                # silently rewrite the algorithm. `_map_server_tool_use_block`
                                # stashes it in `provider_details['strategy']`. Clients that don't
                                # support `bm25` (legacy Bedrock InvokeModel) always replay as `regex`.
                                details = response_part.provider_details or {}
                                strategy: Literal['bm25', 'regex'] = (
                                    'regex'
                                    if details.get('strategy') == 'regex'
                                    or isinstance(self.client, _BM25_TOOL_SEARCH_UNSUPPORTED_CLIENTS)
                                    else 'bm25'
                                )
                                native_name = (
                                    'tool_search_tool_regex' if strategy == 'regex' else 'tool_search_tool_bm25'
                                )
                                # Rebuild the variant-specific wire shape from the
                                # cross-provider `queries` slot. bm25 expects `{"query": "..."}`,
                                # regex expects `{"pattern": "..."}`. Joining with a space
                                # matches Anthropic's single-string input on either variant.
                                args_dict = response_part.args_as_dict()
                                if 'queries' in args_dict:
                                    raw_queries = args_dict.get('queries')
                                    queries: list[str] = (
                                        [q for q in cast('list[Any]', raw_queries) if isinstance(q, str)]
                                        if isinstance(raw_queries, list)
                                        else []
                                    )
                                    wire_key = 'pattern' if strategy == 'regex' else 'query'
                                    wire_input: dict[str, Any] = {wire_key: ' '.join(queries)}
                                else:
                                    wire_input = args_dict
                                server_tool_use_block_param = BetaServerToolUseBlockParam(
                                    id=tool_use_id,
                                    type='server_tool_use',
                                    name=native_name,
                                    input=wire_input,
                                )
                                assistant_content_params.append(server_tool_use_block_param)
                            elif (
                                response_part.tool_name.startswith(MCPServerTool.kind)
                                and (server_id := response_part.tool_name.split(':', 1)[1])
                                and (args := response_part.args_as_dict())
                                and (tool_name := args.get('tool_name'))
                                and (tool_args := args.get('tool_args')) is not None
                            ):  # pragma: no branch
                                mcp_tool_use_block_param = BetaMCPToolUseBlockParam(
                                    id=tool_use_id,
                                    type='mcp_tool_use',
                                    server_name=server_id,
                                    name=tool_name,
                                    input=tool_args,
                                )
                                assistant_content_params.append(mcp_tool_use_block_param)
                    elif isinstance(response_part, NativeToolReturnPart):
                        if response_part.provider_name == self.system:
                            tool_use_id = _guard_tool_call_id(t=response_part)
                            if response_part.tool_name in (
                                WebSearchTool.kind,
                                'web_search_tool_result',  # Backward compatibility
                            ) and isinstance(response_part.content, dict | list):
                                assistant_content_params.append(
                                    BetaWebSearchToolResultBlockParam(
                                        tool_use_id=tool_use_id,
                                        type='web_search_tool_result',
                                        content=cast(
                                            BetaWebSearchToolResultBlockParamContentParam,
                                            response_part.content,  # pyright: ignore[reportUnknownMemberType]
                                        ),
                                    )
                                )
                            elif response_part.tool_name in (
                                CodeExecutionTool.kind,
                                'code_execution_tool_result',  # Backward compatibility
                                'bash_code_execution',
                                'text_editor_code_execution',
                            ) and isinstance(response_part.content, dict):
                                match _get_anthropic_code_execution_tool_name(response_part):
                                    case 'code_execution':
                                        assistant_content_params.append(
                                            BetaCodeExecutionToolResultBlockParam(
                                                tool_use_id=tool_use_id,
                                                type='code_execution_tool_result',
                                                content=cast(
                                                    BetaCodeExecutionToolResultBlockParamContentParam,
                                                    response_part.content,  # pyright: ignore[reportUnknownMemberType]
                                                ),
                                            )
                                        )
                                    case 'bash_code_execution':
                                        assistant_content_params.append(
                                            BetaBashCodeExecutionToolResultBlockParam(
                                                tool_use_id=tool_use_id,
                                                type='bash_code_execution_tool_result',
                                                content=cast(
                                                    BashCodeExecutionToolResultBlockParamContent,
                                                    response_part.content,  # pyright: ignore[reportUnknownMemberType]
                                                ),
                                            )
                                        )
                                    case 'text_editor_code_execution':  # pragma: no branch
                                        assistant_content_params.append(
                                            BetaTextEditorCodeExecutionToolResultBlockParam(
                                                tool_use_id=tool_use_id,
                                                type='text_editor_code_execution_tool_result',
                                                content=cast(
                                                    TextEditorCodeExecutionToolResultBlockParamContent,
                                                    response_part.content,  # pyright: ignore[reportUnknownMemberType]
                                                ),
                                            )
                                        )
                            elif response_part.tool_name == WebFetchTool.kind and isinstance(
                                response_part.content, dict
                            ):
                                assistant_content_params.append(
                                    BetaWebFetchToolResultBlockParam(
                                        tool_use_id=tool_use_id,
                                        type='web_fetch_tool_result',
                                        content=cast(
                                            WebFetchToolResultBlockParamContent,
                                            response_part.content,  # pyright: ignore[reportUnknownMemberType]
                                        ),
                                    )
                                )
                            elif isinstance(response_part, NativeToolSearchReturnPart):
                                assistant_content_params.append(
                                    _build_tool_search_replay_block(response_part, tool_use_id, available_tool_names)
                                )
                            elif response_part.tool_name.startswith(MCPServerTool.kind) and isinstance(
                                response_part.content, dict
                            ):  # pragma: no branch
                                mcp_content = cast(
                                    'dict[str, Any]',
                                    response_part.content,  # pyright: ignore[reportUnknownMemberType]
                                )
                                assistant_content_params.append(
                                    BetaMCPToolResultBlock(
                                        tool_use_id=tool_use_id,
                                        type='mcp_tool_result',
                                        **mcp_content,
                                    )
                                )
                    elif isinstance(response_part, CompactionPart):
                        if response_part.provider_name == self.system:  # pragma: no branch
                            assistant_content_params.append(
                                BetaCompactionBlockParam(content=response_part.content, type='compaction')
                            )
                    elif isinstance(response_part, FilePart):  # pragma: no cover
                        # Files generated by models are not sent back to models that don't themselves generate files.
                        pass
                    else:
                        assert_never(response_part)
                if len(assistant_content_params) > 0:
                    anthropic_messages.append(BetaMessageParam(role='assistant', content=assistant_content_params))
            else:
                assert_never(m)
        instruction_parts = self._get_instruction_parts(messages, model_request_parameters)
        system_prompt = '\n\n'.join(system_prompt_parts)

        # Build system prompt blocks: each instruction part becomes a separate text block.
        # When anthropic_cache_instructions is enabled, the cache point goes after the last
        # static instruction (or at the end if all instructions are static).
        cache_instructions = model_settings.get('anthropic_cache_instructions')

        if instruction_parts or cache_instructions:
            system_prompt_blocks: list[BetaTextBlockParam] = []

            if system_prompt:
                system_prompt_blocks.append(BetaTextBlockParam(type='text', text=system_prompt))

            if instruction_parts:
                for part in instruction_parts:
                    system_prompt_blocks.append(BetaTextBlockParam(type='text', text=part.content))

            if system_prompt_blocks and cache_instructions:
                ttl: Literal['5m', '1h'] = '5m' if cache_instructions is True else cache_instructions
                # Find the last block that corresponds to a static instruction.
                # system_prompt_blocks layout: [system_prompt_block?, ...instruction_blocks]
                # instruction_parts are sorted static-first, so find the boundary.
                if instruction_parts:
                    has_dynamic = any(p.dynamic for p in instruction_parts)
                    if has_dynamic:
                        # Cache after the last static instruction block
                        num_prefix_blocks = 1 if system_prompt else 0
                        num_static = sum(1 for p in instruction_parts if not p.dynamic)
                        if num_static > 0:
                            cache_block_idx = num_prefix_blocks + num_static - 1
                        else:
                            # All dynamic: cache the system prompt block if it exists
                            cache_block_idx = 0 if system_prompt else None
                    else:
                        # All static: cache the last block
                        cache_block_idx = len(system_prompt_blocks) - 1
                else:
                    # No instruction parts, just system prompt: cache it
                    cache_block_idx = 0

                if cache_block_idx is not None:
                    system_prompt_blocks[cache_block_idx]['cache_control'] = self._build_cache_control(ttl)

            if system_prompt_blocks:
                return system_prompt_blocks, anthropic_messages

        return system_prompt, anthropic_messages

    @staticmethod
    def _limit_cache_points(
        system_prompt: str | list[BetaTextBlockParam],
        anthropic_messages: list[BetaMessageParam],
        tools: list[BetaToolUnionParam],
        *,
        automatic_caching: bool = False,
    ) -> None:
        """Limit the number of cache points in the request to Anthropic's maximum.

        Anthropic enforces a maximum of 4 cache points per request. This method ensures
        compliance by counting existing cache points and removing excess ones from messages.

        When automatic_caching is enabled, the server-applied breakpoint uses 1 of the 4
        available slots, so the budget for explicit breakpoints is reduced to 3.

        Strategy:
        1. Count cache points in system_prompt (can be multiple if list of blocks)
        2. Count cache points in tools (can be in any position, not just last)
        3. Raise UserError if system + tools already exceed the budget
        4. Calculate remaining budget for message cache points
        5. Traverse messages from newest to oldest, keeping the most recent cache points
           within the remaining budget
        6. Remove excess cache points from older messages to stay within limit

        Cache point priority (always preserved):
        - System prompt cache points
        - Tool definition cache points
        - Message cache points (newest first, oldest removed if needed)

        Raises:
            UserError: If system_prompt and tools combined already exceed the budget.
                      This indicates a configuration error that cannot be auto-fixed.
        """
        MAX_CACHE_POINTS = 3 if automatic_caching else 4

        # Count existing cache points in system prompt
        used_cache_points = (
            sum(1 for block in system_prompt if 'cache_control' in cast(dict[str, Any], block))
            if isinstance(system_prompt, list)
            else 0
        )

        # Count existing cache points in tools (any tool may have cache_control)
        # Note: cache_control can be in the middle of tools list if builtin tools are added after
        for tool in tools:
            if 'cache_control' in tool:
                used_cache_points += 1

        # Calculate remaining cache points budget for messages
        remaining_budget = MAX_CACHE_POINTS - used_cache_points
        if remaining_budget < 0:  # pragma: no cover
            raise UserError(
                f'Too many cache points for Anthropic request. '
                f'System prompt and tool definitions already use {used_cache_points} cache points, '
                f'which exceeds the maximum of {MAX_CACHE_POINTS}.'
            )
        # Remove excess cache points from messages (newest to oldest)
        for message in reversed(anthropic_messages):
            content = message['content']
            if isinstance(content, str):  # pragma: no cover
                continue

            # Process content blocks in reverse order (newest first)
            for block in reversed(cast(list[BetaContentBlockParam], content)):
                block_dict = cast(dict[str, Any], block)

                if 'cache_control' in block_dict:
                    if remaining_budget > 0:
                        remaining_budget -= 1
                    else:
                        # Exceeded limit, remove this cache point
                        del block_dict['cache_control']

    def _build_cache_control(self, ttl: Literal['5m', '1h'] = '5m') -> BetaCacheControlEphemeralParam:
        """Build a cache control dict with the given TTL.

        Args:
            ttl: The cache time-to-live ('5m' or '1h').

        Returns:
            A cache control dict with the specified TTL.
        """
        return BetaCacheControlEphemeralParam(type='ephemeral', ttl=ttl)

    def _build_automatic_cache_control(
        self, model_settings: AnthropicModelSettings
    ) -> tuple[BetaCacheControlEphemeralParam | None, Literal['5m', '1h'] | None]:
        """Resolve cache settings and build the top-level cache_control parameter.

        Returns:
            A tuple of (top_level_param, resolved_ttl).
            top_level_param is the cache_control for the API (None on unsupported clients).
            resolved_ttl is the effective TTL (None if caching is not enabled), used by
            _apply_per_block_caching_fallback on clients that don't support automatic caching.
        """
        auto_cache = model_settings.get('anthropic_cache')
        cache_messages = model_settings.get('anthropic_cache_messages')

        if auto_cache and cache_messages:
            raise UserError('`anthropic_cache` and `anthropic_cache_messages` cannot both be enabled.')

        if not auto_cache:
            return None, None

        ttl: Literal['5m', '1h'] = '5m' if auto_cache is True else auto_cache
        # Bedrock and Vertex do not support the top-level cache_control parameter
        # (automatic caching). Per-block fallback is handled by _apply_per_block_caching_fallback.
        # Bedrock: https://github.com/anthropics/anthropic-sdk-python/issues/939
        # Vertex: https://github.com/anthropics/anthropic-sdk-python/issues/653
        # Foundry supports automatic caching: https://docs.anthropic.com/en/docs/build-with-claude/prompt-caching#automatic-caching
        if isinstance(self.client, _NON_AUTOMATIC_CACHING_CLIENTS):
            return None, ttl
        return self._build_cache_control(ttl), ttl

    def _apply_per_block_caching_fallback(
        self,
        resolved_ttl: Literal['5m', '1h'] | None,
        anthropic_messages: list[BetaMessageParam],
    ) -> None:
        """Apply per-block message caching as a fallback for automatic caching on unsupported platforms.

        Bedrock and Vertex do not support the top-level `cache_control` parameter used by
        `anthropic_cache` for automatic caching. As a fallback, this applies per-block
        `cache_control` to the last content block of the last user message.

        Args:
            resolved_ttl: The resolved TTL from `_build_automatic_cache_control`, or None
                if caching is not enabled.
            anthropic_messages: The list of Anthropic message params to apply fallback to.
        """
        if resolved_ttl and isinstance(self.client, _NON_AUTOMATIC_CACHING_CLIENTS):
            self._apply_message_cache_control(anthropic_messages, resolved_ttl)

    def _apply_explicit_message_caching(
        self,
        model_settings: AnthropicModelSettings,
        anthropic_messages: list[BetaMessageParam],
    ) -> None:
        """Apply per-block message caching when `anthropic_cache_messages` is enabled.

        Mutually exclusive with `anthropic_cache` (enforced by `_build_automatic_cache_control`).
        """
        if cache_messages := model_settings.get('anthropic_cache_messages'):
            ttl: Literal['5m', '1h'] = '5m' if cache_messages is True else cache_messages
            self._apply_message_cache_control(anthropic_messages, ttl)

    def _apply_message_cache_control(
        self,
        anthropic_messages: list[BetaMessageParam],
        ttl: Literal['5m', '1h'],
    ) -> None:
        """Apply per-block `cache_control` to the last content block of the last message.

        If the last block already has `cache_control` (e.g. from an explicit `CachePoint`),
        it is left unchanged to preserve the user's chosen TTL.

        Assumes `anthropic_messages` is non-empty.
        """
        last_message = anthropic_messages[-1]
        content = last_message['content']
        if isinstance(content, str):  # pragma: no cover
            last_message['content'] = [
                BetaTextBlockParam(
                    type='text',
                    text=content,
                    cache_control=self._build_cache_control(ttl),
                )
            ]
        else:
            content_blocks = cast(list[BetaContentBlockParam], content)
            if content_blocks and 'cache_control' not in cast(dict[str, Any], content_blocks[-1]):
                self._add_cache_control_to_last_param(content_blocks, ttl)

    def _add_cache_control_to_last_param(
        self, params: list[BetaContentBlockParam], ttl: Literal['5m', '1h'] = '5m'
    ) -> None:
        """Add cache control to the last content block param.

        See https://docs.anthropic.com/en/docs/build-with-claude/prompt-caching for more information.

        Args:
            params: List of content block params to modify.
            ttl: The cache time-to-live ('5m' or '1h').
        """
        if not params:
            raise UserError(
                'CachePoint cannot be the first content in a user message - there must be previous content to attach the CachePoint to. '
                'To cache system instructions or tool definitions, use the `anthropic_cache_instructions` or `anthropic_cache_tool_definitions` settings instead.'
            )

        # Only certain types support cache_control
        # See https://docs.anthropic.com/en/docs/build-with-claude/prompt-caching#what-can-be-cached
        cacheable_types = {'text', 'tool_use', 'server_tool_use', 'image', 'tool_result', 'document'}
        # Cast needed because BetaContentBlockParam is a union including response Block types (Pydantic models)
        # that don't support dict operations, even though at runtime we only have request Param types (TypedDicts).
        last_param = cast(dict[str, Any], params[-1])
        if last_param['type'] not in cacheable_types:
            raise UserError(f'Cache control not supported for param type: {last_param["type"]}')

        # Add cache_control to the last param
        last_param['cache_control'] = self._build_cache_control(ttl)

    @staticmethod
    def _map_binary_data(data: bytes, media_type: str) -> BetaImageBlockParam | BetaRequestDocumentBlockParam:
        if media_type.startswith('image/'):
            return BetaImageBlockParam(
                source={'data': io.BytesIO(data), 'media_type': media_type, 'type': 'base64'},  # pyright: ignore[reportArgumentType]
                type='image',
            )
        elif media_type == 'application/pdf':
            return BetaRequestDocumentBlockParam(
                source=BetaBase64PDFSourceParam(
                    data=io.BytesIO(data),
                    media_type='application/pdf',
                    type='base64',
                ),
                type='document',
            )
        elif media_type == 'text/plain':
            return BetaRequestDocumentBlockParam(
                source=BetaPlainTextSourceParam(data=data.decode('utf-8'), media_type=media_type, type='text'),
                type='document',
            )
        else:  # pragma: no cover
            raise RuntimeError(f'Unsupported binary content media type for Anthropic: {media_type}')

    @staticmethod
    async def _map_image_url(item: ImageUrl) -> BetaImageBlockParam:
        if item.force_download:
            downloaded = await download_item(item, data_format='bytes')
            return AnthropicModel._map_binary_data(downloaded['data'], item.media_type)  # pyright: ignore[reportReturnType]
        return BetaImageBlockParam(source={'type': 'url', 'url': item.url}, type='image')

    @staticmethod
    async def _map_document_url(item: DocumentUrl) -> BetaRequestDocumentBlockParam:
        if item.media_type == 'application/pdf':
            if item.force_download:
                downloaded = await download_item(item, data_format='bytes')
                return AnthropicModel._map_binary_data(downloaded['data'], item.media_type)  # pyright: ignore[reportReturnType]
            return BetaRequestDocumentBlockParam(source={'url': item.url, 'type': 'url'}, type='document')
        elif item.media_type == 'text/plain':
            downloaded_item = await download_item(item, data_format='text')
            return BetaRequestDocumentBlockParam(
                source=BetaPlainTextSourceParam(data=downloaded_item['data'], media_type=item.media_type, type='text'),
                type='document',
            )
        else:  # pragma: no cover
            raise RuntimeError(f'Unsupported document media type: {item.media_type}')

    @staticmethod
    async def _map_file_to_content_block(
        item: BinaryContent | ImageUrl | DocumentUrl | AudioUrl | VideoUrl,
        context: str,
    ) -> BetaImageBlockParam | BetaRequestDocumentBlockParam:
        """Map a multimodal file item to its Anthropic API content block."""
        if isinstance(item, BinaryContent):
            if item.is_image or item.is_document:
                return AnthropicModel._map_binary_data(item.data, item.media_type)
            raise NotImplementedError(f'Unsupported binary content type in Anthropic {context}: {item.media_type}')
        elif isinstance(item, ImageUrl):
            return await AnthropicModel._map_image_url(item)
        elif isinstance(item, DocumentUrl):
            return await AnthropicModel._map_document_url(item)
        elif isinstance(item, AudioUrl):
            raise NotImplementedError(f'AudioUrl is not supported in Anthropic {context}')
        else:
            raise NotImplementedError(f'VideoUrl is not supported in Anthropic {context}')

    async def _map_user_prompt(
        self,
        part: UserPromptPart,
    ) -> AsyncGenerator[BetaContentBlockParam | CachePoint]:
        if isinstance(part.content, str):
            if part.content:  # Only yield non-empty text
                yield BetaTextBlockParam(text=part.content, type='text')
        else:
            for item in part.content:
                if isinstance(item, str | TextContent):
                    text = item if isinstance(item, str) else item.content
                    if text:  # Only yield non-empty text
                        yield BetaTextBlockParam(text=text, type='text')
                elif isinstance(item, CachePoint):
                    yield item
                elif isinstance(item, UploadedFile):
                    if item.provider_name != self.system:
                        raise UserError(
                            f'UploadedFile with `provider_name={item.provider_name!r}` cannot be used with AnthropicModel. '
                            f'Expected `provider_name` to be `{self.system!r}`.'
                        )
                    if item.media_type.startswith('image/'):
                        yield BetaImageBlockParam(
                            source=BetaFileImageSourceParam(file_id=item.file_id, type='file'),
                            type='image',
                        )
                    elif item.media_type.startswith(('text/', 'application/')):
                        yield BetaRequestDocumentBlockParam(
                            source=BetaFileDocumentSourceParam(file_id=item.file_id, type='file'),
                            type='document',
                        )
                    else:
                        raise UserError(
                            f'Unsupported media type {item.media_type!r} for Anthropic file upload. '
                            'Only image and document (text/application) types are supported.'
                        )
                elif is_multi_modal_content(item):
                    yield await AnthropicModel._map_file_to_content_block(item, 'user prompts')  # pyright: ignore[reportArgumentType]
                else:
                    raise RuntimeError(f'Unsupported content type: {type(item)}')  # pragma: no cover

    def _map_tool_definition(self, f: ToolDefinition, model_settings: AnthropicModelSettings) -> BetaToolParam:
        """Maps a `ToolDefinition` dataclass to an Anthropic `BetaToolParam` dictionary."""
        tool_param: BetaToolParam = {
            'name': f.name,
            'description': f.description or '',
            'input_schema': f.parameters_json_schema,
        }
        if f.strict and self.profile.get('supports_json_schema_output', False):
            tool_param['strict'] = f.strict
        if model_settings.get('anthropic_eager_input_streaming'):
            tool_param['eager_input_streaming'] = True
        if f.with_native == ToolSearchTool.kind:
            # `defer_loading` on the wire controls Anthropic's native tool search
            # caching. `ToolDefinition.defer_loading` is the local discovery flag and
            # is unrelated to what the provider API sees — hence the separate check on
            # `with_native` here.
            tool_param['defer_loading'] = True
        return tool_param

    def _build_output_config(
        self, model_request_parameters: ModelRequestParameters, model_settings: AnthropicModelSettings
    ) -> BetaOutputConfigParam | None:
        output_format: BetaJSONOutputFormatParam | None = None
        if model_request_parameters.output_mode == 'native':
            assert model_request_parameters.output_object is not None
            output_format = {'type': 'json_schema', 'schema': model_request_parameters.output_object.json_schema}

        effort: Literal['low', 'medium', 'high', 'xhigh', 'max'] | None = model_settings.get('anthropic_effort')
        # Fall back to unified thinking effort level when anthropic_effort is not set
        # Only map effort level strings; bare True just enables thinking without a specific effort
        profile = self.profile
        if (
            effort is None
            and profile.get('anthropic_supports_effort', False)
            and isinstance(model_request_parameters.thinking, str)
        ):
            effort_map: dict[ThinkingEffort, Literal['low', 'medium', 'high', 'xhigh', 'max']] = {
                'minimal': 'low',
                'low': 'low',
                'medium': 'medium',
                'high': 'high',
                'xhigh': 'xhigh' if profile.get('anthropic_supports_xhigh_effort', False) else 'max',
            }
            effort = effort_map[model_request_parameters.thinking]

        task_budget = self._get_task_budget(model_settings)

        if output_format is None and effort is None and task_budget is None:
            return None

        config: BetaOutputConfigParam = {}
        if output_format is not None:
            config['format'] = output_format
        if effort is not None:
            config['effort'] = effort
        if task_budget is not None:
            config['task_budget'] = task_budget
        return config

    def _get_task_budget(self, model_settings: AnthropicModelSettings) -> AnthropicTaskBudget | None:
        task_budget = model_settings.get('anthropic_task_budget')
        if task_budget is None:
            return None

        profile = self.profile
        if not profile.get('anthropic_supports_task_budgets', False):
            raise UserError(
                f'Model {self.model_name!r} does not support `anthropic_task_budget`. '
                'Anthropic task budgets are currently only supported on `claude-opus-4-7`.'
            )

        return task_budget

    @staticmethod
    def _validate_task_budget_vs_context_management(
        model_settings: AnthropicModelSettings,
        context_management: BetaContextManagementConfigParam | None,
    ) -> None:
        # Anthropic rejects requests that combine `task_budget.remaining` with a
        # server-side compaction edit (the API tracks the budget itself). Fail fast with a
        # helpful message rather than letting the API return an opaque 400.
        task_budget = model_settings.get('anthropic_task_budget')
        if task_budget is None or 'remaining' not in task_budget:
            return
        if not isinstance(context_management, dict):
            return
        edits = context_management.get('edits') or ()
        if any(isinstance(e, dict) and e.get('type') == _ANTHROPIC_COMPACT_EDIT_TYPE for e in edits):
            raise UserError(
                '`anthropic_task_budget.remaining` cannot be combined with `AnthropicCompaction`: '
                'Anthropic rejects this combination because server-side compaction tracks the budget itself. '
                'Use `remaining` for client-side budget tracking, or `AnthropicCompaction` '
                'for server-side compaction — not both.'
            )

init

__init__(
    model_name: AnthropicModelName,
    *,
    provider: (
        Literal["anthropic", "gateway"]
        | Provider[AsyncAnthropicClient]
    ) = "anthropic",
    profile: ModelProfileSpec | None = None,
    settings: ModelSettings | None = None
)

Initialize an Anthropic model.

Parameters:

Name	Type	Description	Default
`model_name`	`AnthropicModelName`	The name of the Anthropic model to use. List of model names available here.	required
`provider`	`Literal['anthropic', 'gateway'] \| Provider[AsyncAnthropicClient]`	The provider to use for the Anthropic API. Can be either the string 'anthropic' or an instance of `Provider[AsyncAnthropicClient]`. Defaults to 'anthropic'.	`'anthropic'`
`profile`	`ModelProfileSpec \| None`	The model profile to use. Defaults to a profile picked by the provider based on the model name. The default 'anthropic' provider will use the default `..profiles.anthropic.anthropic_model_profile`.	`None`
`settings`	`ModelSettings \| None`	Default model settings for this model instance.	`None`

Source code in pydantic_ai_slim/pydantic_ai/models/anthropic.py

def __init__(
    self,
    model_name: AnthropicModelName,
    *,
    provider: Literal['anthropic', 'gateway'] | Provider[AsyncAnthropicClient] = 'anthropic',
    profile: ModelProfileSpec | None = None,
    settings: ModelSettings | None = None,
):
    """Initialize an Anthropic model.

    Args:
        model_name: The name of the Anthropic model to use. List of model names available
            [here](https://docs.anthropic.com/en/docs/about-claude/models).
        provider: The provider to use for the Anthropic API. Can be either the string 'anthropic' or an
            instance of `Provider[AsyncAnthropicClient]`. Defaults to 'anthropic'.
        profile: The model profile to use. Defaults to a profile picked by the provider based on the model name.
            The default 'anthropic' provider will use the default `..profiles.anthropic.anthropic_model_profile`.
        settings: Default model settings for this model instance.
    """
    self._model_name = model_name

    if isinstance(provider, str):
        provider = infer_provider('gateway/anthropic' if provider == 'gateway' else provider)
    self._provider = provider

    super().__init__(settings=settings, profile=profile)

model_name `property`

model_name: AnthropicModelName

The model name.

system `property`

system: str

The model provider.

supported_native_tools `classmethod`

supported_native_tools() -> (
    frozenset[type[AbstractNativeTool]]
)

The set of builtin tool types this model can handle.

Source code in pydantic_ai_slim/pydantic_ai/models/anthropic.py

@classmethod
def supported_native_tools(cls) -> frozenset[type[AbstractNativeTool]]:
    """The set of builtin tool types this model can handle."""
    return frozenset({WebSearchTool, CodeExecutionTool, WebFetchTool, MemoryTool, MCPServerTool, ToolSearchTool})

AnthropicCompaction

Bases: AbstractCapability[AgentDepsT]

Compaction capability for Anthropic models.

Configures automatic context management via Anthropic's context_management API parameter. Compaction triggers server-side when input tokens exceed the configured threshold.

Example usage:

from pydantic_ai import Agent
from pydantic_ai.models.anthropic import AnthropicCompaction

agent = Agent(
    'anthropic:claude-sonnet-4-6',
    capabilities=[AnthropicCompaction(token_threshold=100_000)],
)

Source code in pydantic_ai_slim/pydantic_ai/models/anthropic.py

class AnthropicCompaction(AbstractCapability[AgentDepsT]):
    """Compaction capability for Anthropic models.

    Configures automatic context management via Anthropic's `context_management`
    API parameter. Compaction triggers server-side when input tokens exceed
    the configured threshold.

    Example usage:

    ```python {test="skip"}
    from pydantic_ai import Agent
    from pydantic_ai.models.anthropic import AnthropicCompaction

    agent = Agent(
        'anthropic:claude-sonnet-4-6',
        capabilities=[AnthropicCompaction(token_threshold=100_000)],
    )
    ```
    """

    def __init__(
        self,
        *,
        token_threshold: int = 150_000,
        instructions: str | None = None,
        pause_after_compaction: bool = False,
    ) -> None:
        """Initialize the Anthropic compaction capability.

        Args:
            token_threshold: Compact when input tokens exceed this threshold. Minimum 50,000.
            instructions: Custom instructions for the compaction summarization.
            pause_after_compaction: If `True`, the response will stop after the compaction block
                with `stop_reason='compaction'`, allowing explicit handling.
        """
        self.token_threshold = token_threshold
        self.instructions = instructions
        self.pause_after_compaction = pause_after_compaction

    def get_model_settings(self) -> Callable[[RunContext[AgentDepsT]], ModelSettings]:
        edit: dict[str, Any] = {
            'type': _ANTHROPIC_COMPACT_EDIT_TYPE,
            'trigger': {'type': 'input_tokens', 'value': self.token_threshold},
        }
        if self.pause_after_compaction:
            edit['pause_after_compaction'] = True
        if self.instructions is not None:
            edit['instructions'] = self.instructions

        def resolve(ctx: RunContext[AgentDepsT]) -> ModelSettings:
            # Append our edit to any existing context_management the user may have configured,
            # preserving other fields (not just edits).
            existing_cm: dict[str, Any] = {}
            if ctx.model_settings:
                raw_cm = cast(dict[str, Any], ctx.model_settings).get('anthropic_context_management')
                if isinstance(raw_cm, dict):  # pragma: no branch
                    existing_cm = {**cast(dict[str, Any], raw_cm)}
            existing_edits = cast(list[dict[str, Any]], existing_cm.get('edits', []))
            existing_cm['edits'] = [*existing_edits, edit]
            return cast(ModelSettings, {'anthropic_context_management': existing_cm})

        return resolve

    @classmethod
    def get_serialization_name(cls) -> str | None:
        return 'AnthropicCompaction'

init

__init__(
    *,
    token_threshold: int = 150000,
    instructions: str | None = None,
    pause_after_compaction: bool = False
) -> None

Initialize the Anthropic compaction capability.

Parameters:

Name	Type	Description	Default
`token_threshold`	`int`	Compact when input tokens exceed this threshold. Minimum 50,000.	`150000`
`instructions`	`str \| None`	Custom instructions for the compaction summarization.	`None`
`pause_after_compaction`	`bool`	If `True`, the response will stop after the compaction block with `stop_reason='compaction'`, allowing explicit handling.	`False`

Source code in pydantic_ai_slim/pydantic_ai/models/anthropic.py

def __init__(
    self,
    *,
    token_threshold: int = 150_000,
    instructions: str | None = None,
    pause_after_compaction: bool = False,
) -> None:
    """Initialize the Anthropic compaction capability.

    Args:
        token_threshold: Compact when input tokens exceed this threshold. Minimum 50,000.
        instructions: Custom instructions for the compaction summarization.
        pause_after_compaction: If `True`, the response will stop after the compaction block
            with `stop_reason='compaction'`, allowing explicit handling.
    """
    self.token_threshold = token_threshold
    self.instructions = instructions
    self.pause_after_compaction = pause_after_compaction

AnthropicStreamedResponse `dataclass`

Bases: StreamedResponse

Implementation of StreamedResponse for Anthropic models.

Source code in pydantic_ai_slim/pydantic_ai/models/anthropic.py

@dataclass
class AnthropicStreamedResponse(StreamedResponse):
    """Implementation of `StreamedResponse` for Anthropic models."""

    _model_name: AnthropicModelName
    _response: _utils.PeekableAsyncStream[BetaRawMessageStreamEvent, AsyncStream[BetaRawMessageStreamEvent]]
    _provider_name: str
    _provider_url: str
    _timestamp: datetime = field(default_factory=_utils.now_utc)

    async def _get_event_iterator(self) -> AsyncIterator[ModelResponseStreamEvent]:  # noqa: C901
        with _map_api_errors(self._model_name):
            current_block: BetaContentBlock | None = None

            builtin_tool_calls: dict[str, NativeToolCallPart] = {}
            async for event in self._response:
                if isinstance(event, BetaRawMessageStartEvent):
                    self._usage = _map_usage(event, self._provider_name, self._provider_url, self._model_name)
                    self.provider_response_id = event.message.id
                    if event.message.container:
                        self.provider_details = self.provider_details or {}
                        self.provider_details['container_id'] = event.message.container.id

                elif isinstance(event, BetaRawContentBlockStartEvent):
                    current_block = event.content_block
                    if isinstance(current_block, BetaTextBlock) and current_block.text:
                        for event_ in self._parts_manager.handle_text_delta(
                            vendor_part_id=event.index, content=current_block.text
                        ):
                            yield event_
                    elif isinstance(current_block, BetaThinkingBlock):
                        for event_ in self._parts_manager.handle_thinking_delta(
                            vendor_part_id=event.index,
                            content=current_block.thinking,
                            signature=current_block.signature,
                            provider_name=self.provider_name,
                        ):
                            yield event_
                    elif isinstance(current_block, BetaRedactedThinkingBlock):
                        for event_ in self._parts_manager.handle_thinking_delta(
                            vendor_part_id=event.index,
                            id='redacted_thinking',
                            signature=current_block.data,
                            provider_name=self.provider_name,
                        ):
                            yield event_
                    elif isinstance(current_block, BetaToolUseBlock):
                        maybe_event = self._parts_manager.handle_tool_call_delta(
                            vendor_part_id=event.index,
                            tool_name=current_block.name,
                            args=cast(dict[str, Any], current_block.input) or None,
                            tool_call_id=current_block.id,
                        )
                        if maybe_event is not None:  # pragma: no branch
                            yield maybe_event
                    elif isinstance(current_block, BetaServerToolUseBlock):
                        call_part = _map_server_tool_use_block(current_block, self.provider_name)
                        builtin_tool_calls[call_part.tool_call_id] = call_part
                        # In streaming, the block's `input` is empty at start and arrives via
                        # subsequent `BetaInputJSONDelta` events. Emit with `args=None` so the
                        # accumulating JSON deltas can attach as a string; the
                        # `BetaRawContentBlockStopEvent` handler below normalizes the final
                        # value back to the canonical part shape (matching non-streaming).
                        yield self._parts_manager.handle_part(
                            vendor_part_id=event.index,
                            part=replace(call_part, args=None),
                        )
                    elif isinstance(current_block, BetaWebSearchToolResultBlock):
                        yield self._parts_manager.handle_part(
                            vendor_part_id=event.index,
                            part=_map_web_search_tool_result_block(current_block, self.provider_name),
                        )
                    elif isinstance(current_block, BetaToolSearchToolResultBlock):
                        yield self._parts_manager.handle_part(
                            vendor_part_id=event.index,
                            part=_map_tool_search_tool_result_block(current_block, self.provider_name),
                        )
                    elif isinstance(current_block, BetaCodeExecutionToolResultBlock):  # pragma: no cover
                        # Legacy code execution responses used this bare `code_execution_tool_result` shape.
                        # Current code execution tool versions emit the named bash/text-editor blocks below.
                        yield self._parts_manager.handle_part(
                            vendor_part_id=event.index,
                            part=_map_code_execution_tool_result_block(current_block, self.provider_name),
                        )
                    elif isinstance(current_block, BetaBashCodeExecutionToolResultBlock):
                        yield self._parts_manager.handle_part(
                            vendor_part_id=event.index,
                            part=_map_bash_code_execution_tool_result_block(current_block, self.provider_name),
                        )
                    elif isinstance(current_block, BetaTextEditorCodeExecutionToolResultBlock):
                        yield self._parts_manager.handle_part(
                            vendor_part_id=event.index,
                            part=_map_text_editor_code_execution_tool_result_block(current_block, self.provider_name),
                        )
                    elif isinstance(current_block, BetaWebFetchToolResultBlock):  # pragma: lax no cover
                        yield self._parts_manager.handle_part(
                            vendor_part_id=event.index,
                            part=_map_web_fetch_tool_result_block(current_block, self.provider_name),
                        )
                    elif isinstance(current_block, BetaMCPToolUseBlock):
                        call_part = _map_mcp_server_use_block(current_block, self.provider_name)
                        builtin_tool_calls[call_part.tool_call_id] = call_part

                        args_json = call_part.args_as_json_str()
                        # Drop the final `{}}` so that we can add tool args deltas
                        args_json_delta = args_json[:-3]
                        assert args_json_delta.endswith('"tool_args":'), (
                            f'Expected {args_json_delta!r} to end in `"tool_args":`'
                        )

                        yield self._parts_manager.handle_part(
                            vendor_part_id=event.index, part=replace(call_part, args=None)
                        )
                        maybe_event = self._parts_manager.handle_tool_call_delta(
                            vendor_part_id=event.index,
                            args=args_json_delta,
                        )
                        if maybe_event is not None:  # pragma: no branch
                            yield maybe_event
                    elif isinstance(current_block, BetaMCPToolResultBlock):
                        call_part = builtin_tool_calls.get(current_block.tool_use_id)
                        yield self._parts_manager.handle_part(
                            vendor_part_id=event.index,
                            part=_map_mcp_server_result_block(current_block, call_part, self.provider_name),
                        )
                    elif isinstance(current_block, BetaCompactionBlock):
                        yield self._parts_manager.handle_part(
                            vendor_part_id=event.index,
                            part=CompactionPart(content=current_block.content, provider_name=self.provider_name),
                        )

                elif isinstance(event, BetaRawContentBlockDeltaEvent):
                    if isinstance(event.delta, BetaTextDelta):
                        for event_ in self._parts_manager.handle_text_delta(
                            vendor_part_id=event.index, content=event.delta.text
                        ):
                            yield event_
                    elif isinstance(event.delta, BetaThinkingDelta):
                        for event_ in self._parts_manager.handle_thinking_delta(
                            vendor_part_id=event.index,
                            content=event.delta.thinking,
                            provider_name=self.provider_name,
                        ):
                            yield event_
                    elif isinstance(event.delta, BetaSignatureDelta):
                        for event_ in self._parts_manager.handle_thinking_delta(
                            vendor_part_id=event.index,
                            signature=event.delta.signature,
                            provider_name=self.provider_name,
                        ):
                            yield event_
                    elif isinstance(event.delta, BetaInputJSONDelta):
                        maybe_event = self._parts_manager.handle_tool_call_delta(
                            vendor_part_id=event.index,
                            args=event.delta.partial_json,
                        )
                        if maybe_event is not None:  # pragma: no branch
                            yield maybe_event
                    elif isinstance(event.delta, BetaCompactionContentBlockDelta):
                        if event.delta.content:  # pragma: no branch
                            # Re-emit part with updated content; replaces the initial block start part
                            yield self._parts_manager.handle_part(
                                vendor_part_id=event.index,
                                part=CompactionPart(content=event.delta.content, provider_name=self.provider_name),
                            )
                    # TODO(Marcelo): We need to handle citations.
                    elif isinstance(event.delta, BetaCitationsDelta):
                        pass

                elif isinstance(event, BetaRawMessageDeltaEvent):
                    self._usage = _map_usage(
                        event, self._provider_name, self._provider_url, self._model_name, self._usage
                    )
                    if raw_finish_reason := event.delta.stop_reason:  # pragma: no branch
                        self.provider_details = self.provider_details or {}
                        self.provider_details['finish_reason'] = raw_finish_reason
                        self.finish_reason = _FINISH_REASON_MAP.get(raw_finish_reason)
                    if event.delta.container:
                        self.provider_details = self.provider_details or {}
                        self.provider_details['container_id'] = event.delta.container.id

                elif isinstance(event, BetaRawContentBlockStopEvent):  # pragma: no branch
                    if isinstance(current_block, BetaMCPToolUseBlock):
                        maybe_event = self._parts_manager.handle_tool_call_delta(
                            vendor_part_id=event.index,
                            args='}',
                        )
                        if maybe_event is not None:  # pragma: no branch
                            yield maybe_event
                    elif isinstance(current_block, BetaServerToolUseBlock) and current_block.name in (
                        'tool_search_tool_regex',
                        'tool_search_tool_bm25',
                    ):
                        # The streaming start emitted the part with `args=None`; JSON deltas
                        # have since accumulated as a string. Re-emit with the normalized
                        # cross-provider `ToolSearchArgs` shape so downstream code (history
                        # replay, typed-part dispatch) sees the same structure as the
                        # non-streaming `_process_response` path produces.
                        existing = self._parts_manager.get_part_by_vendor_id(event.index)
                        if isinstance(existing, NativeToolSearchCallPart):  # pragma: no branch
                            yield self._parts_manager.handle_part(
                                vendor_part_id=event.index,
                                part=_finalize_streamed_tool_search_call_part(existing),
                            )
                    current_block = None
                elif isinstance(event, BetaRawMessageStopEvent):  # pragma: no branch
                    current_block = None

    async def close_stream(self) -> None:
        await self._response.source.close()

    @property
    def model_name(self) -> AnthropicModelName:
        """Get the model name of the response."""
        return self._model_name

    @property
    def provider_name(self) -> str:
        """Get the provider name."""
        return self._provider_name

    @property
    def provider_url(self) -> str:
        """Get the provider base URL."""
        return self._provider_url

    @property
    def timestamp(self) -> datetime:
        """Get the timestamp of the response."""
        return self._timestamp

model_name `property`

model_name: AnthropicModelName

Get the model name of the response.

provider_name `property`

provider_name: str

Get the provider name.

provider_url `property`

provider_url: str

Get the provider base URL.

timestamp `property`

timestamp: datetime

Get the timestamp of the response.

pydantic_ai.models.anthropic

Setup

LatestAnthropicModelNames module-attribute

AnthropicModelName module-attribute

AnthropicTaskBudget module-attribute

AnthropicModelSettings

anthropic_metadata instance-attribute

anthropic_thinking instance-attribute

anthropic_cache_tool_definitions instance-attribute

anthropic_service_tier instance-attribute

anthropic_cache_instructions instance-attribute

anthropic_cache_messages instance-attribute

anthropic_cache instance-attribute

anthropic_effort instance-attribute

anthropic_task_budget instance-attribute

anthropic_container instance-attribute

anthropic_code_execution_tool_version instance-attribute

anthropic_eager_input_streaming instance-attribute

anthropic_betas instance-attribute

anthropic_speed instance-attribute

anthropic_context_management instance-attribute

AnthropicModel dataclass

__init__

model_name property

system property

supported_native_tools classmethod

AnthropicCompaction

__init__

AnthropicStreamedResponse dataclass

model_name property

provider_name property

provider_url property

timestamp property

`pydantic_ai.models.anthropic`

LatestAnthropicModelNames `module-attribute`

AnthropicModelName `module-attribute`

AnthropicTaskBudget `module-attribute`

anthropic_metadata `instance-attribute`

anthropic_thinking `instance-attribute`

anthropic_cache_tool_definitions `instance-attribute`

anthropic_service_tier `instance-attribute`

anthropic_cache_instructions `instance-attribute`

anthropic_cache_messages `instance-attribute`

anthropic_cache `instance-attribute`

anthropic_effort `instance-attribute`

anthropic_task_budget `instance-attribute`

anthropic_container `instance-attribute`

anthropic_code_execution_tool_version `instance-attribute`

anthropic_eager_input_streaming `instance-attribute`

anthropic_betas `instance-attribute`

anthropic_speed `instance-attribute`

anthropic_context_management `instance-attribute`

AnthropicModel `dataclass`

init

model_name `property`

system `property`

supported_native_tools `classmethod`

init

AnthropicStreamedResponse `dataclass`

model_name `property`

provider_name `property`

provider_url `property`

timestamp `property`