Output

"Output" refers to the final value returned from running an agent. This can be either plain text, structured data, an image, or the result of a function called with arguments provided by the model.

The output is wrapped in AgentRunResult or StreamedRunResult so that you can access other data, like usage of the run and message history.

Both AgentRunResult and StreamedRunResult are generic in the data they wrap, so typing information about the data returned by the agent is preserved.

A run ends when the model responds with one of the output types, or, if no output type is specified or str is one of the allowed options, when a plain text response is received. A run can also be cancelled if usage limits are exceeded, see Usage Limits.

Here's an example using a Pydantic model as the output_type, forcing the model to respond with data matching our specification:

With Pydantic AI GatewayDirectly to Provider API

Learn about Gateway olympics.py

from pydantic import BaseModel

from pydantic_ai import Agent


class CityLocation(BaseModel):
    city: str
    country: str


agent = Agent('gateway/google:gemini-3-flash-preview', output_type=CityLocation)
result = agent.run_sync('Where were the olympics held in 2012?')
print(result.output)
#> city='London' country='United Kingdom'
print(result.usage)
#> RunUsage(input_tokens=57, output_tokens=8, requests=1)

olympics.py

from pydantic import BaseModel

from pydantic_ai import Agent


class CityLocation(BaseModel):
    city: str
    country: str


agent = Agent('google:gemini-3-flash-preview', output_type=CityLocation)
result = agent.run_sync('Where were the olympics held in 2012?')
print(result.output)
#> city='London' country='United Kingdom'
print(result.usage)
#> RunUsage(input_tokens=57, output_tokens=8, requests=1)

(This example is complete, it can be run "as is")

Structured output data

The Agent class constructor takes an output_type argument that takes one or more types or output functions. It supports simple scalar types, list and dict types (including TypedDicts and StructuredDicts), dataclasses and Pydantic models, as well as type unions -- generally everything supported as type hints in a Pydantic model. You can also pass a list of multiple choices.

By default, Pydantic AI leverages the model's tool calling capability to make it return structured data. When multiple output types are specified (in a union or list), each member is registered with the model as a separate output tool in order to reduce the complexity of the schema and maximise the chances a model will respond correctly. This has been shown to work well across a wide range of models. If you'd like to change the names of the output tools, use a model's native structured output feature, or pass the output schema to the model in its instructions, you can use an output mode marker class.

When no output type is specified, or when str is among the output types, any plain text response from the model will be used as the output data. If str is not among the output types, the model is forced to return structured data or call an output function.

If the output type schema is not of type "object" (e.g. it's int or list[int]), the output type is wrapped in a single element object, so the schema of all tools registered with the model are object schemas.

Structured outputs (like tools) use Pydantic to build the JSON schema used for the tool, and to validate the data returned by the model.

Type checking considerations

The Agent class is generic in its output type, and this type is carried through to AgentRunResult.output and StreamedRunResult.output so that your IDE or static type checker can warn you when your code doesn't properly take into account all the possible values those outputs could have.

Static type checkers like pyright and mypy will do their best to infer the agent's output type from the output_type you've specified, but they're not always able to do so correctly when you provide functions or multiple types in a union or list, even though Pydantic AI will behave correctly. When this happens, your type checker will complain even when you're confident you've passed a valid output_type, and you'll need to help the type checker by explicitly specifying the generic parameters on the Agent constructor. This is shown in the second example below and the output functions example further down.

Specifically, there are three valid uses of output_type where you'll need to do this:

When using a union of types, e.g. output_type=Foo | Bar. Until PEP-747 "Annotating Type Forms" lands in Python 3.15, type checkers do not consider these a valid value for output_type. In addition to the generic parameters on the Agent constructor, you'll need to add # type: ignore to the line that passes the union to output_type. Alternatively, you can use a list: output_type=[Foo, Bar].
With mypy: When using a list, as a functionally equivalent alternative to a union, or because you're passing in output functions. Pyright does handle this correctly, and we've filed an issue with mypy to try and get this fixed.
With mypy: when using an async output function. Pyright does handle this correctly, and we've filed an issue with mypy to try and get this fixed.

Here's an example of returning either text or structured data:

With Pydantic AI GatewayDirectly to Provider API

Learn about Gateway box_or_error.py

from pydantic import BaseModel

from pydantic_ai import Agent


class Box(BaseModel):
    width: int
    height: int
    depth: int
    units: str


agent = Agent(
    'gateway/openai:gpt-5-mini',
    output_type=[Box, str], # (1)!
    instructions=(
        "Extract me the dimensions of a box, "
        "if you can't extract all data, ask the user to try again."
    ),
)

result = agent.run_sync('The box is 10x20x30')
print(result.output)
#> Please provide the units for the dimensions (e.g., cm, in, m).

result = agent.run_sync('The box is 10x20x30 cm')
print(result.output)
#> width=10 height=20 depth=30 units='cm'

This could also have been a union: output_type=Box | str. However, as explained in the "Type checking considerations" section above, that would've required explicitly specifying the generic parameters on the Agent constructor and adding # type: ignore to this line in order to be type checked correctly.

box_or_error.py

from pydantic import BaseModel

from pydantic_ai import Agent


class Box(BaseModel):
    width: int
    height: int
    depth: int
    units: str


agent = Agent(
    'openai:gpt-5-mini',
    output_type=[Box, str], # (1)!
    instructions=(
        "Extract me the dimensions of a box, "
        "if you can't extract all data, ask the user to try again."
    ),
)

result = agent.run_sync('The box is 10x20x30')
print(result.output)
#> Please provide the units for the dimensions (e.g., cm, in, m).

result = agent.run_sync('The box is 10x20x30 cm')
print(result.output)
#> width=10 height=20 depth=30 units='cm'

This could also have been a union: output_type=Box | str. However, as explained in the "Type checking considerations" section above, that would've required explicitly specifying the generic parameters on the Agent constructor and adding # type: ignore to this line in order to be type checked correctly.

(This example is complete, it can be run "as is")

Here's an example of using a union return type, which will register multiple output tools and wrap non-object schemas in an object:

colors_or_sizes.py

from pydantic_ai import Agent

agent = Agent[object, list[str] | list[int]](
    'openai:gpt-5-mini',
    output_type=list[str] | list[int],  # type: ignore # (1)!
    instructions='Extract either colors or sizes from the shapes provided.',
)

result = agent.run_sync('red square, blue circle, green triangle')
print(result.output)
#> ['red', 'blue', 'green']

result = agent.run_sync('square size 10, circle size 20, triangle size 30')
print(result.output)
#> [10, 20, 30]

As explained in the "Type checking considerations" section above, using a union rather than a list requires explicitly specifying the generic parameters on the Agent constructor and adding # type: ignore to this line in order to be type checked correctly.

(This example is complete, it can be run "as is")

Output functions

Instead of plain text or structured data, you may want the output of your agent run to be the result of a function called with arguments provided by the model, for example to further process or validate the data provided through the arguments (with the option to tell the model to try again), or to hand off to another agent.

Output functions are similar to function tools, but the model is forced to call one of them, the call ends the agent run, and the result is not passed back to the model.

As with tool functions, output function arguments provided by the model are validated using Pydantic (with optional validation context), can optionally take RunContext as the first argument, and can raise ModelRetry to ask the model to try again with modified arguments (or with a different output type).

To specify output functions, you set the agent's output_type to either a single function (or bound instance method), or a list of functions. The list can also contain other output types like simple scalars or entire Pydantic models. You typically do not want to also register your output function as a tool (using the @agent.tool decorator or tools argument), as this could confuse the model about which it should be calling.

Here's an example of all of these features in action:

output_functions.py

import re

from pydantic import BaseModel

from pydantic_ai import Agent, ModelRetry, RunContext, UnexpectedModelBehavior


class Row(BaseModel):
    name: str
    country: str


tables = {
    'capital_cities': [
        Row(name='Amsterdam', country='Netherlands'),
        Row(name='Mexico City', country='Mexico'),
    ]
}


class SQLFailure(BaseModel):
    """An unrecoverable failure. Only use this when you can't change the query to make it work."""

    explanation: str


def run_sql_query(query: str) -> list[Row]:
    """Run a SQL query on the database."""

    select_table = re.match(r'SELECT (.+) FROM (\w+)', query)
    if select_table:
        column_names = select_table.group(1)
        if column_names != '*':
            raise ModelRetry("Only 'SELECT *' is supported, you'll have to do column filtering manually.")

        table_name = select_table.group(2)
        if table_name not in tables:
            raise ModelRetry(
                f"Unknown table '{table_name}' in query '{query}'. Available tables: {', '.join(tables.keys())}."
            )

        return tables[table_name]

    raise ModelRetry(f"Unsupported query: '{query}'.")


sql_agent = Agent[object, list[Row] | SQLFailure](
    'openai:gpt-5.2',
    output_type=[run_sql_query, SQLFailure],
    instructions='You are a SQL agent that can run SQL queries on a database.',
)


async def hand_off_to_sql_agent(ctx: RunContext, query: str) -> list[Row]:
    """I take natural language queries, turn them into SQL, and run them on a database."""

    # Drop the final message with the output tool call, as it shouldn't be passed on to the SQL agent
    messages = ctx.messages[:-1]
    try:
        result = await sql_agent.run(query, message_history=messages)
        output = result.output
        if isinstance(output, SQLFailure):
            raise ModelRetry(f'SQL agent failed: {output.explanation}')
        return output
    except UnexpectedModelBehavior as e:
        # Bubble up potentially retryable errors to the router agent
        if (cause := e.__cause__) and isinstance(cause, ModelRetry):
            raise ModelRetry(f'SQL agent failed: {cause.message}') from e
        else:
            raise


class RouterFailure(BaseModel):
    """Use me when no appropriate agent is found or the used agent failed."""

    explanation: str


router_agent = Agent[object, list[Row] | RouterFailure](
    'openai:gpt-5.2',
    output_type=[hand_off_to_sql_agent, RouterFailure],
    instructions='You are a router to other agents. Never try to solve a problem yourself, just pass it on.',
)

result = router_agent.run_sync('Select the names and countries of all capitals')
print(result.output)
"""
[
    Row(name='Amsterdam', country='Netherlands'),
    Row(name='Mexico City', country='Mexico'),
]
"""

result = router_agent.run_sync('Select all pets')
print(repr(result.output))
"""
RouterFailure(explanation="The requested table 'pets' does not exist in the database. The only available table is 'capital_cities', which does not contain data about pets.")
"""

result = router_agent.run_sync('How do I fly from Amsterdam to Mexico City?')
print(repr(result.output))
"""
RouterFailure(explanation='I am not equipped to provide travel information, such as flights from Amsterdam to Mexico City.')
"""

Text output

If you provide an output function that takes a string, Pydantic AI will by default create an output tool like for any other output function. If instead you'd like the model to provide the string using plain text output, you can wrap the function in the TextOutput marker class.

If desired, this marker class can be used alongside one or more ToolOutput marker classes (or unmarked types or functions) in a list provided to output_type.

Like other output functions, text output functions can optionally take RunContext as the first argument, and can raise ModelRetry to ask the model to try again with modified arguments (or with a different output type).

With Pydantic AI GatewayDirectly to Provider API

Learn about Gateway text_output_function.py

from pydantic_ai import Agent, TextOutput


def split_into_words(text: str) -> list[str]:
    return text.split()


agent = Agent(
    'gateway/openai:gpt-5.2',
    output_type=TextOutput(split_into_words),
)
result = agent.run_sync('Who was Albert Einstein?')
print(result.output)
#> ['Albert', 'Einstein', 'was', 'a', 'German-born', 'theoretical', 'physicist.']

text_output_function.py

from pydantic_ai import Agent, TextOutput


def split_into_words(text: str) -> list[str]:
    return text.split()


agent = Agent(
    'openai:gpt-5.2',
    output_type=TextOutput(split_into_words),
)
result = agent.run_sync('Who was Albert Einstein?')
print(result.output)
#> ['Albert', 'Einstein', 'was', 'a', 'German-born', 'theoretical', 'physicist.']

(This example is complete, it can be run "as is")

Handling partial output in output functions

When streaming with run_stream() or run_stream_sync(), output functions are called multiple times — once for each partial output received from the model, and once for the final complete output.

You should check the RunContext.partial_output flag when your output function has side effects (e.g., sending notifications, logging, database updates) that should only execute on the final output.

When streaming, partial_output is True for each partial output and False for the final complete output. For all other run methods, partial_output is always False as the function is only called once with the complete output.

With Pydantic AI GatewayDirectly to Provider API

Learn about Gateway output_function_with_side_effects.py

from pydantic import BaseModel

from pydantic_ai import Agent, RunContext


class DatabaseRecord(BaseModel):
    name: str
    value: int | None = None  # Make optional to allow partial output


def save_to_database(ctx: RunContext, record: DatabaseRecord) -> DatabaseRecord:
    """Output function with side effect - only save final output to database."""
    if ctx.partial_output:
        # Skip side effects for partial outputs
        return record

    # Only execute side effect for the final output
    print(f'Saving to database: {record.name} = {record.value}')
    #> Saving to database: test = 42
    return record


agent = Agent('gateway/openai:gpt-5.2', output_type=save_to_database)


async def main():
    async with agent.run_stream('Create a record with name "test" and value 42') as result:
        async for output in result.stream_output(debounce_by=None):
            print(output)
            #> name='test' value=None
            #> name='test' value=42

output_function_with_side_effects.py

from pydantic import BaseModel

from pydantic_ai import Agent, RunContext


class DatabaseRecord(BaseModel):
    name: str
    value: int | None = None  # Make optional to allow partial output


def save_to_database(ctx: RunContext, record: DatabaseRecord) -> DatabaseRecord:
    """Output function with side effect - only save final output to database."""
    if ctx.partial_output:
        # Skip side effects for partial outputs
        return record

    # Only execute side effect for the final output
    print(f'Saving to database: {record.name} = {record.value}')
    #> Saving to database: test = 42
    return record


agent = Agent('openai:gpt-5.2', output_type=save_to_database)


async def main():
    async with agent.run_stream('Create a record with name "test" and value 42') as result:
        async for output in result.stream_output(debounce_by=None):
            print(output)
            #> name='test' value=None
            #> name='test' value=42

(This example is complete, it can be run "as is" — you'll need to add asyncio.run(main()) to run main)

Output modes

Pydantic AI implements three different methods to get a model to output structured data:

Tool Output, where tool calls are used to produce the output.
Native Output, where the model is required to produce text content compliant with a provided JSON schema.
Prompted Output, where a prompt is injected into the model instructions including the desired JSON schema, and we attempt to parse the model's plain-text response as appropriate.

Tool Output

In the default Tool Output mode, the output JSON schema of each output type (or function) is provided to the model as the parameters schema of a special output tool. This is the default as it's supported by virtually all models and has been shown to work very well.

If you'd like to change the name of the output tool, pass a custom description to aid the model, or turn on or off strict mode, you can wrap the type(s) in the ToolOutput marker class and provide the appropriate arguments. Note that by default, the description is taken from the docstring specified on a Pydantic model or output function, so specifying it using the marker class is typically not necessary.

When using output tools, each tool gets its own retry counter — the output side of the agent retry budget (set with AgentRetries via Agent(retries={'output': N}), or per-run via agent.run(retries={'output': N})) is the default per-tool limit. To override the limit for an individual output tool, pass max_retries on ToolOutput: ToolOutput(Fruit, max_retries=2). See How output retries are enforced for the relationship to the text-output path's global budget.

To dynamically modify or filter the available output tools during an agent run, you can define an agent-wide prepare_output_tools function that will be called ahead of each step of a run. This function should be of type ToolsPrepareFunc, which takes the RunContext and a list of ToolDefinition, and returns a new list of tool definitions (or None to disable all tools for that step). This is analogous to the prepare_tools function for non-output tools.

With Pydantic AI GatewayDirectly to Provider API

Learn about Gateway tool_output.py

from pydantic import BaseModel

from pydantic_ai import Agent, ToolOutput


class Fruit(BaseModel):
    name: str
    color: str


class Vehicle(BaseModel):
    name: str
    wheels: int


agent = Agent(
    'gateway/openai:gpt-5.2',
    output_type=[ # (1)!
        ToolOutput(Fruit, name='return_fruit'),
        ToolOutput(Vehicle, name='return_vehicle'),
    ],
)
result = agent.run_sync('What is a banana?')
print(repr(result.output))
#> Fruit(name='banana', color='yellow')

If we were passing just Fruit and Vehicle without custom tool names, we could have used a union: output_type=Fruit | Vehicle. However, as ToolOutput is an object rather than a type, we have to use a list.

tool_output.py

from pydantic import BaseModel

from pydantic_ai import Agent, ToolOutput


class Fruit(BaseModel):
    name: str
    color: str


class Vehicle(BaseModel):
    name: str
    wheels: int


agent = Agent(
    'openai:gpt-5.2',
    output_type=[ # (1)!
        ToolOutput(Fruit, name='return_fruit'),
        ToolOutput(Vehicle, name='return_vehicle'),
    ],
)
result = agent.run_sync('What is a banana?')
print(repr(result.output))
#> Fruit(name='banana', color='yellow')

If we were passing just Fruit and Vehicle without custom tool names, we could have used a union: output_type=Fruit | Vehicle. However, as ToolOutput is an object rather than a type, we have to use a list.

(This example is complete, it can be run "as is")

Parallel Output Tool Calls

An output tool call is what ends a run and produces its final result. When a model emits one in the same response as other tool calls, the agent's end_strategy decides what happens to the rest. Most agents never need to think about this, since most responses don't mix an output tool with other tools — but when one does, end_strategy controls how those calls run and which one becomes the final result.

Priority of output and deferred tools in streaming methods

The run_stream() and run_stream_sync() methods will consider the first output that matches the output type (which could be text, an output tool call, or a deferred tool call) to be the final output of the agent run, even when the model generates (additional) tool calls after this "final" output.

This means that if the model calls deferred tools before output tools when using these methods, the deferred tool calls determine the agent run's final output, while the other run methods would have prioritized the tool output. Regardless of end_strategy, these methods commit the first matching output the instant it streams, so they behave like 'early': that result is locked in, and the retry-after-tool-failure behavior below does not apply.

Strategy	Output tools	Function tools — output succeeded	Function tools — every output failed
`'graceful'` (default)	Run in emission order; first success is the final result, later output tools skipped	Run, in parallel where possible, in emission order	Run; the run continues
`'early'`	Run in emission order; the run ends at the first success	Skipped	Run; the run continues
`'exhaustive'`	All run, in parallel; first valid result by emission order wins	Run, in parallel	Run; the run continues

'graceful' is the default and the right choice for most agents: function tools the model requested alongside an output tool still run, so their side effects happen and their results are available to the model if the run continues. Only the first successful output tool is used; later output tools are skipped so their side effects don't fire more than once.

Choose 'early' to end the run the instant an output tool succeeds — function tools requested in the same response are then skipped entirely. This is the fastest option when you never need those function tools to run once you have a result.

Choose 'exhaustive' to run every tool, including additional output tools whose results won't be used. This gives the model full visibility that each tool ran, at the cost of executing output-tool side effects that are ultimately discarded.

When every output tool fails, function tools run and the run continues under all three strategies: there is no result to end on, so the output failures go back to the model as retries and the function tools the model also asked for are run, letting it react to both on the next round.

Retrying after a tool failure

Under the 'graceful' and 'exhaustive' end strategies, function tools requested alongside an output tool still run. If one of them raises ModelRetry (or its arguments fail validation) in the same response as a successful output tool, the output result is not used as the final result. Instead, the retry is sent back to the model so it can correct the problem, since the output may have been based on the failed tool call. This does not apply under 'early', where function tools don't run once an output succeeds, nor when streaming, where the first matching output is committed immediately.

Controlling output tool parallelism

Like function tools, output tools run concurrently. Under the 'exhaustive' end strategy, where multiple output tools can run in parallel, you can make an output tool a barrier with ToolOutput(sequential=True) — useful when you want all of a response's function tools to finish before the output tool runs. This is the output-tool counterpart of the sequential=True flag for function tools; see Parallel tool calls & concurrency for how the barrier behaves and how to run an entire run's tools serially.

Native Output

Native Output mode uses a model's native "Structured Outputs" feature (aka "JSON Schema response format"), where the model is forced to only output text matching the provided JSON schema. Note that this is not supported by all models, and sometimes comes with restrictions. For example, Gemini cannot use tools at the same time as structured output, and attempting to do so will result in an error.

To use this mode, you can wrap the output type(s) in the NativeOutput marker class that also lets you specify a name and description if the name and docstring of the type or function are not sufficient.

With Pydantic AI GatewayDirectly to Provider API

Learn about Gateway native_output.py

from pydantic_ai import Agent, NativeOutput

from tool_output import Fruit, Vehicle

agent = Agent(
    'gateway/openai:gpt-5.2',
    output_type=NativeOutput(
        [Fruit, Vehicle], # (1)!
        name='Fruit_or_vehicle',
        description='Return a fruit or vehicle.'
    ),
)
result = agent.run_sync('What is a Ford Explorer?')
print(repr(result.output))
#> Vehicle(name='Ford Explorer', wheels=4)

This could also have been a union: output_type=Fruit | Vehicle. However, as explained in the "Type checking considerations" section above, that would've required explicitly specifying the generic parameters on the Agent constructor and adding # type: ignore to this line in order to be type checked correctly.

native_output.py

from pydantic_ai import Agent, NativeOutput

from tool_output import Fruit, Vehicle

agent = Agent(
    'openai:gpt-5.2',
    output_type=NativeOutput(
        [Fruit, Vehicle], # (1)!
        name='Fruit_or_vehicle',
        description='Return a fruit or vehicle.'
    ),
)
result = agent.run_sync('What is a Ford Explorer?')
print(repr(result.output))
#> Vehicle(name='Ford Explorer', wheels=4)

This could also have been a union: output_type=Fruit | Vehicle. However, as explained in the "Type checking considerations" section above, that would've required explicitly specifying the generic parameters on the Agent constructor and adding # type: ignore to this line in order to be type checked correctly.

(This example is complete, it can be run "as is")

Prompted Output

In this mode, the model is prompted to output text matching the provided JSON schema through its instructions and it's up to the model to interpret those instructions correctly. This is usable with all models, but is often the least reliable approach as the model is not forced to match the schema.

While we would generally suggest starting with tool or native output, in some cases this mode may result in higher quality outputs, and for models without native tool calling or structured output support it is the only option for producing structured outputs.

If the model API supports the "JSON Mode" feature (aka "JSON Object response format") to force the model to output valid JSON, this is enabled, but it's still up to the model to abide by the schema. Pydantic AI will validate the returned structured data and tell the model to try again if validation fails, but if the model is not intelligent enough this may not be sufficient.

To use this mode, you can wrap the output type(s) in the PromptedOutput marker class that also lets you specify a name and description if the name and docstring of the type or function are not sufficient. Additionally, template lets you specify a custom instructions template to be used instead of the default, or template=False to disable the schema prompt entirely.

With Pydantic AI GatewayDirectly to Provider API

Learn about Gateway prompted_output.py

from pydantic import BaseModel

from pydantic_ai import Agent, PromptedOutput

from tool_output import Vehicle


class Device(BaseModel):
    name: str
    kind: str


agent = Agent(
    'gateway/openai:gpt-5.2',
    output_type=PromptedOutput(
        [Vehicle, Device], # (1)!
        name='Vehicle or device',
        description='Return a vehicle or device.'
    ),
)
result = agent.run_sync('What is a MacBook?')
print(repr(result.output))
#> Device(name='MacBook', kind='laptop')

agent = Agent(
    'gateway/openai:gpt-5.2',
    output_type=PromptedOutput(
        [Vehicle, Device],
        template='Gimme some JSON: {schema}'
    ),
)
result = agent.run_sync('What is a Ford Explorer?')
print(repr(result.output))
#> Vehicle(name='Ford Explorer', wheels=4)

This could also have been a union: output_type=Vehicle | Device. However, as explained in the "Type checking considerations" section above, that would've required explicitly specifying the generic parameters on the Agent constructor and adding # type: ignore to this line in order to be type checked correctly.

prompted_output.py

from pydantic import BaseModel

from pydantic_ai import Agent, PromptedOutput

from tool_output import Vehicle


class Device(BaseModel):
    name: str
    kind: str


agent = Agent(
    'openai:gpt-5.2',
    output_type=PromptedOutput(
        [Vehicle, Device], # (1)!
        name='Vehicle or device',
        description='Return a vehicle or device.'
    ),
)
result = agent.run_sync('What is a MacBook?')
print(repr(result.output))
#> Device(name='MacBook', kind='laptop')

agent = Agent(
    'openai:gpt-5.2',
    output_type=PromptedOutput(
        [Vehicle, Device],
        template='Gimme some JSON: {schema}'
    ),
)
result = agent.run_sync('What is a Ford Explorer?')
print(repr(result.output))
#> Vehicle(name='Ford Explorer', wheels=4)

This could also have been a union: output_type=Vehicle | Device. However, as explained in the "Type checking considerations" section above, that would've required explicitly specifying the generic parameters on the Agent constructor and adding # type: ignore to this line in order to be type checked correctly.

(This example is complete, it can be run "as is")

Custom JSON schema

If it's not feasible to define your desired structured output object using a Pydantic BaseModel, dataclass, or TypedDict, for example when you get a JSON schema from an external source or generate it dynamically, you can use the StructuredDict() helper function to generate a dict[str, Any] subclass with a JSON schema attached that Pydantic AI will pass to the model.

Note that Pydantic AI will not perform any validation of the received JSON object and it's up to the model to correctly interpret the schema and any constraints expressed in it, like required fields or integer value ranges.

The output type will be a dict[str, Any] and it's up to your code to defensively read from it in case the model made a mistake. You can use an output validator to reflect validation errors back to the model and get it to try again.

Along with the JSON schema, you can optionally pass name and description arguments to provide additional context to the model:

With Pydantic AI GatewayDirectly to Provider API

Learn about Gateway

from pydantic_ai import Agent, StructuredDict

HumanDict = StructuredDict(
    {
        'type': 'object',
        'properties': {
            'name': {'type': 'string'},
            'age': {'type': 'integer'}
        },
        'required': ['name', 'age']
    },
    name='Human',
    description='A human with a name and age',
)

agent = Agent('gateway/openai:gpt-5.2', output_type=HumanDict)
result = agent.run_sync('Create a person')
#> {'name': 'John Doe', 'age': 30}

from pydantic_ai import Agent, StructuredDict

HumanDict = StructuredDict(
    {
        'type': 'object',
        'properties': {
            'name': {'type': 'string'},
            'age': {'type': 'integer'}
        },
        'required': ['name', 'age']
    },
    name='Human',
    description='A human with a name and age',
)

agent = Agent('openai:gpt-5.2', output_type=HumanDict)
result = agent.run_sync('Create a person')
#> {'name': 'John Doe', 'age': 30}

Validation context

Some validation relies on an extra Pydantic context object. You can pass such an object to an Agent at definition-time via its validation_context parameter. It will be used in the validation of both structured outputs and tool arguments.

This validation context can be either:

the context object itself (Any), used as-is to validate outputs, or
a function that takes the RunContext and returns a context object (Any). This function will be called automatically before each validation, allowing you to build a dynamic validation context.

Don't confuse this validation context with the LLM context

This Pydantic validation context object is only used internally by Pydantic AI for tool arg and output validation. In particular, it is not included in the prompts or messages sent to the language model.

With Pydantic AI GatewayDirectly to Provider API

Learn about Gateway validation_context.py

from dataclasses import dataclass

from pydantic import BaseModel, ValidationInfo, field_validator

from pydantic_ai import Agent


class Value(BaseModel):
    x: int

    @field_validator('x')
    def increment_value(cls, value: int, info: ValidationInfo):
        return value + (info.context or 0)


agent = Agent(
    'gateway/google:gemini-3-flash-preview',
    output_type=Value,
    validation_context=10,
)
result = agent.run_sync('Give me a value of 5.')
print(repr(result.output))  # 5 from the model + 10 from the validation context
#> Value(x=15)


@dataclass
class Deps:
    increment: int


agent = Agent(
    'gateway/google:gemini-3-flash-preview',
    output_type=Value,
    deps_type=Deps,
    validation_context=lambda ctx: ctx.deps.increment,
)
result = agent.run_sync('Give me a value of 5.', deps=Deps(increment=10))
print(repr(result.output))  # 5 from the model + 10 from the validation context
#> Value(x=15)

validation_context.py

from dataclasses import dataclass

from pydantic import BaseModel, ValidationInfo, field_validator

from pydantic_ai import Agent


class Value(BaseModel):
    x: int

    @field_validator('x')
    def increment_value(cls, value: int, info: ValidationInfo):
        return value + (info.context or 0)


agent = Agent(
    'google:gemini-3-flash-preview',
    output_type=Value,
    validation_context=10,
)
result = agent.run_sync('Give me a value of 5.')
print(repr(result.output))  # 5 from the model + 10 from the validation context
#> Value(x=15)


@dataclass
class Deps:
    increment: int


agent = Agent(
    'google:gemini-3-flash-preview',
    output_type=Value,
    deps_type=Deps,
    validation_context=lambda ctx: ctx.deps.increment,
)
result = agent.run_sync('Give me a value of 5.', deps=Deps(increment=10))
print(repr(result.output))  # 5 from the model + 10 from the validation context
#> Value(x=15)

(This example is complete, it can be run "as is")

Output validators

Some validation is inconvenient or impossible to do in Pydantic validators, in particular when the validation requires IO and is asynchronous. Pydantic AI provides a way to add validation functions via the agent.output_validator decorator.

Each ModelRetry raised here consumes one unit of the run's output retry budget. The budget defaults to 1 and can be set on the agent with AgentRetries via Agent(retries={'output': N}), on a single run via agent.run(retries={'output': N}), or per output tool via ToolOutput(max_retries=N). Inside the validator, ctx.max_retries reflects the limit that will actually stop you (the global budget on the text path, or the per-tool limit on the tool path) and ctx.retry is the global retry counter, so it stays consistent across output-tool switches within a single run. See How output retries are enforced for the full enforcement model.

If you want to implement separate validation logic for different output types, it's recommended to use output functions instead, to save you from having to do isinstance checks inside the output validator. If you want the model to output plain text, do your own processing or validation, and then have the agent's final output be the result of your function, it's recommended to use an output function with the TextOutput marker class.

Here's a simplified variant of the SQL Generation example:

sql_gen.py

from fake_database import DatabaseConn, QueryError
from pydantic import BaseModel

from pydantic_ai import Agent, RunContext, ModelRetry


class Success(BaseModel):
    sql_query: str


class InvalidRequest(BaseModel):
    error_message: str


Output = Success | InvalidRequest
agent = Agent[DatabaseConn, Output](
    'google:gemini-3-flash-preview',
    output_type=Output,  # type: ignore
    deps_type=DatabaseConn,
    instructions='Generate PostgreSQL flavored SQL queries based on user input.',
)


@agent.output_validator
async def validate_sql(ctx: RunContext[DatabaseConn], output: Output) -> Output:
    if isinstance(output, InvalidRequest):
        return output
    try:
        await ctx.deps.execute(f'EXPLAIN {output.sql_query}')
    except QueryError as e:
        raise ModelRetry(f'Invalid query: {e}') from e
    else:
        return output


result = agent.run_sync(
    'get me users who were last active yesterday.', deps=DatabaseConn()
)
print(result.output)
#> sql_query='SELECT * FROM users WHERE last_active::date = today() - interval 1 day'

(This example is complete, it can be run "as is")

Handling partial output in output validators

When streaming with run_stream() or run_stream_sync(), output validators are called multiple times — once for each partial output received from the model, and once for the final complete output.

You should check the RunContext.partial_output flag when you want to validate only the complete result, not intermediate partial values.

When streaming, partial_output is True for each partial output and False for the final complete output. For all other run methods, partial_output is always False as the validator is only called once with the complete output.

With Pydantic AI GatewayDirectly to Provider API

Learn about Gateway partial_validation_streaming.py

from pydantic_ai import Agent, ModelRetry, RunContext

agent = Agent('gateway/openai:gpt-5.2')


@agent.output_validator
def validate_output(ctx: RunContext, output: str) -> str:
    if ctx.partial_output:
        return output

    if len(output) < 50:
        raise ModelRetry('Output is too short.')
    return output


async def main():
    async with agent.run_stream('Write a long story about a cat') as result:
        async for message in result.stream_text():
            print(message)
            #> Once upon a
            #> Once upon a time, there was
            #> Once upon a time, there was a curious cat
            #> Once upon a time, there was a curious cat named Whiskers who
            #> Once upon a time, there was a curious cat named Whiskers who loved to explore
            #> Once upon a time, there was a curious cat named Whiskers who loved to explore the world around
            #> Once upon a time, there was a curious cat named Whiskers who loved to explore the world around him...

partial_validation_streaming.py

from pydantic_ai import Agent, ModelRetry, RunContext

agent = Agent('openai:gpt-5.2')


@agent.output_validator
def validate_output(ctx: RunContext, output: str) -> str:
    if ctx.partial_output:
        return output

    if len(output) < 50:
        raise ModelRetry('Output is too short.')
    return output


async def main():
    async with agent.run_stream('Write a long story about a cat') as result:
        async for message in result.stream_text():
            print(message)
            #> Once upon a
            #> Once upon a time, there was
            #> Once upon a time, there was a curious cat
            #> Once upon a time, there was a curious cat named Whiskers who
            #> Once upon a time, there was a curious cat named Whiskers who loved to explore
            #> Once upon a time, there was a curious cat named Whiskers who loved to explore the world around
            #> Once upon a time, there was a curious cat named Whiskers who loved to explore the world around him...

(This example is complete, it can be run "as is" — you'll need to add asyncio.run(main()) to run main)

Image output

Some models can generate images as part of their response, for example those that support the Image Generation native tool and OpenAI models using the Code Execution native tool when told to generate a chart.

To use the generated image as the output of the agent run, you can set output_type to BinaryImage. If no image-generating native tool is explicitly specified, the ImageGenerationTool will be enabled automatically.

With Pydantic AI GatewayDirectly to Provider API

Learn about Gateway image_output.py

from pydantic_ai import Agent, BinaryImage

agent = Agent('gateway/openai-responses:gpt-5.2', output_type=BinaryImage)

result = agent.run_sync('Generate an image of an axolotl.')
assert isinstance(result.output, BinaryImage)

image_output.py

from pydantic_ai import Agent, BinaryImage

agent = Agent('openai-responses:gpt-5.2', output_type=BinaryImage)

result = agent.run_sync('Generate an image of an axolotl.')
assert isinstance(result.output, BinaryImage)

(This example is complete, it can be run "as is")

If an agent does not need to always generate an image, you can use a union of BinaryImage and str. If the model generates both, the image will take precedence as output and the text will be available on ModelResponse.text:

With Pydantic AI GatewayDirectly to Provider API

Learn about Gateway image_output_union.py

from pydantic_ai import Agent, BinaryImage

agent = Agent('gateway/openai-responses:gpt-5.2', output_type=BinaryImage | str)

result = agent.run_sync('Tell me a two-sentence story about an axolotl, no image please.')
print(result.output)
"""
Once upon a time, in a hidden underwater cave, lived a curious axolotl named Pip who loved to explore. One day, while venturing further than usual, Pip discovered a shimmering, ancient coin that granted wishes!
"""

result = agent.run_sync('Tell me a two-sentence story about an axolotl with an illustration.')
assert isinstance(result.output, BinaryImage)
print(result.response.text)
"""
Once upon a time, in a hidden underwater cave, lived a curious axolotl named Pip who loved to explore. One day, while venturing further than usual, Pip discovered a shimmering, ancient coin that granted wishes!
"""

image_output_union.py

from pydantic_ai import Agent, BinaryImage

agent = Agent('openai-responses:gpt-5.2', output_type=BinaryImage | str)

result = agent.run_sync('Tell me a two-sentence story about an axolotl, no image please.')
print(result.output)
"""
Once upon a time, in a hidden underwater cave, lived a curious axolotl named Pip who loved to explore. One day, while venturing further than usual, Pip discovered a shimmering, ancient coin that granted wishes!
"""

result = agent.run_sync('Tell me a two-sentence story about an axolotl with an illustration.')
assert isinstance(result.output, BinaryImage)
print(result.response.text)
"""
Once upon a time, in a hidden underwater cave, lived a curious axolotl named Pip who loved to explore. One day, while venturing further than usual, Pip discovered a shimmering, ancient coin that granted wishes!
"""

Optional output (allowing `None`)

Some agents perform their work entirely through tool calls and don't need to produce a final output — for example, an agent that updates a record via a tool and then stops. Certain models (notably Anthropic) will return an empty response in this case, which by default causes Pydantic AI to retry until the model produces content.

To instead treat an empty response as a successful run, include None in the output_type:

With Pydantic AI GatewayDirectly to Provider API

Learn about Gateway optional_output.py

from pydantic_ai import Agent

agent = Agent('gateway/anthropic:claude-opus-4-6', output_type=str | None)


@agent.tool_plain
def mark_task_done(task_id: int) -> str:
    """Mark the task as done."""
    return f'Task {task_id} marked done.'


result = agent.run_sync('Mark task 1 as done, then stop without saying anything.')
print(result.output)
#> None

optional_output.py

from pydantic_ai import Agent

agent = Agent('anthropic:claude-opus-4-6', output_type=str | None)


@agent.tool_plain
def mark_task_done(task_id: int) -> str:
    """Mark the task as done."""
    return f'Task {task_id} marked done.'


result = agent.run_sync('Mark task 1 as done, then stop without saying anything.')
print(result.output)
#> None

When the model returns an empty response and None is an allowed output type, the agent will return None instead of retrying. Output validator functions still run with None as the argument, so you can raise ModelRetry to reject it if needed.

output_type=str | None is the canonical case: it's handled as regular text output, and the only way the model signals None is by returning an empty response — there's no output tool or structured schema involved. This mirrors how plain str is already treated specially as free-form text output rather than a structured tool call.

None is also supported in the other output modes, with an extra structured commit path in addition to (or in place of) the empty-response fallback:

Bare unions including None that use tool mode — e.g. output_type=int | None, output_type=[int, float, None], or output_type=[ToolOutput(Foo), None]: a dedicated final_result_NoneType output tool is exposed alongside the other output tools, so the model can commit to None through a tool call. An empty model response is still also treated as None, as with str | None.
Explicit output mode markers — e.g. output_type=ToolOutput(int | None), output_type=NativeOutput([int, None]), or output_type=PromptedOutput([int, None]): None is included as a branch of the structured schema the wrapper generates. The model commits by calling the tool with null (for ToolOutput) or by selecting the NoneType branch of the discriminated schema (for NativeOutput/PromptedOutput). An empty response is not accepted — once you've opted into an explicit structured output mode, the model is expected to commit through the schema.

Note

output_type=None on its own is not valid — at least one other output type must be provided alongside None.

Note

When using [agent.run_stream()][pydantic_ai.Agent.run_stream] with an optional output type, an empty model response has no intermediate values to yield, so stream_output() produces an empty iterator in this case. Use get_output() to retrieve the final None value instead.

Streamed Results

There two main challenges with streamed results:

Validating structured responses before they're complete, this is achieved by "partial validation" which was recently added to Pydantic in pydantic/pydantic#10748.
When receiving a response, we don't know if it's the final response without starting to stream it and peeking at the content. Pydantic AI streams just enough of the response to sniff out if it's a tool call or an output, then streams the whole thing and calls tools, or returns the stream as a StreamedRunResult.

Note

As the run_stream() method will consider the first output matching the output_type to be the final output, it will stop running the agent graph and will not execute any tool calls made by the model after this "final" output.

If you want to always run the agent graph to completion and stream all events from the model's streaming response and the agent's execution of tools, use agent.run_stream_events() (docs) or agent.iter() (docs) instead.

Streaming Text

Example of streamed text output:

With Pydantic AI GatewayDirectly to Provider API

Learn about Gateway streamed_hello_world.py

from pydantic_ai import Agent

agent = Agent('gateway/google:gemini-3-flash-preview')  # (1)!


async def main():
    async with agent.run_stream('Where does "hello world" come from?') as result:  # (2)!
        async for message in result.stream_text():  # (3)!
            print(message)
            #> The first known
            #> The first known use of "hello,
            #> The first known use of "hello, world" was in
            #> The first known use of "hello, world" was in a 1974 textbook
            #> The first known use of "hello, world" was in a 1974 textbook about the C
            #> The first known use of "hello, world" was in a 1974 textbook about the C programming language.

Streaming works with the standard Agent class, and doesn't require any special setup, just a model that supports streaming (currently all models support streaming).
The Agent.run_stream() method is used to start a streamed run, this method returns a context manager so the connection can be closed when the stream completes.
Each item yield by StreamedRunResult.stream_text() is the complete text response, extended as new data is received.

streamed_hello_world.py

from pydantic_ai import Agent

agent = Agent('google:gemini-3-flash-preview')  # (1)!


async def main():
    async with agent.run_stream('Where does "hello world" come from?') as result:  # (2)!
        async for message in result.stream_text():  # (3)!
            print(message)
            #> The first known
            #> The first known use of "hello,
            #> The first known use of "hello, world" was in
            #> The first known use of "hello, world" was in a 1974 textbook
            #> The first known use of "hello, world" was in a 1974 textbook about the C
            #> The first known use of "hello, world" was in a 1974 textbook about the C programming language.

Streaming works with the standard Agent class, and doesn't require any special setup, just a model that supports streaming (currently all models support streaming).
The Agent.run_stream() method is used to start a streamed run, this method returns a context manager so the connection can be closed when the stream completes.
Each item yield by StreamedRunResult.stream_text() is the complete text response, extended as new data is received.

(This example is complete, it can be run "as is" — you'll need to add asyncio.run(main()) to run main)

We can also stream text as deltas rather than the entire text in each item:

With Pydantic AI GatewayDirectly to Provider API

Learn about Gateway streamed_delta_hello_world.py

from pydantic_ai import Agent

agent = Agent('gateway/google:gemini-3-flash-preview')


async def main():
    async with agent.run_stream('Where does "hello world" come from?') as result:
        async for message in result.stream_text(delta=True):  # (1)!
            print(message)
            #> The first known
            #> use of "hello,
            #> world" was in
            #> a 1974 textbook
            #> about the C
            #> programming language.

stream_text will error if the response is not text.

streamed_delta_hello_world.py

from pydantic_ai import Agent

agent = Agent('google:gemini-3-flash-preview')


async def main():
    async with agent.run_stream('Where does "hello world" come from?') as result:
        async for message in result.stream_text(delta=True):  # (1)!
            print(message)
            #> The first known
            #> use of "hello,
            #> world" was in
            #> a 1974 textbook
            #> about the C
            #> programming language.

stream_text will error if the response is not text.

(This example is complete, it can be run "as is" — you'll need to add asyncio.run(main()) to run main)

Output message not included in messages

The final output message will NOT be added to result messages if you use .stream_text(delta=True), see Messages and chat history for more information.

Streaming Structured Output

Here's an example of streaming a user profile as it's built:

With Pydantic AI GatewayDirectly to Provider API

Learn about Gateway streamed_user_profile.py

from datetime import date

from typing_extensions import NotRequired, TypedDict

from pydantic_ai import Agent


class UserProfile(TypedDict):
    name: str
    dob: NotRequired[date]
    bio: NotRequired[str]


agent = Agent(
    'gateway/openai:gpt-5.2',
    output_type=UserProfile,
    instructions='Extract a user profile from the input',
)


async def main():
    user_input = 'My name is Ben, I was born on January 28th 1990, I like the chain the dog and the pyramid.'
    async with agent.run_stream(user_input) as result:
        async for profile in result.stream_output():
            print(profile)
            #> {'name': 'Ben'}
            #> {'name': 'Ben'}
            #> {'name': 'Ben', 'dob': date(1990, 1, 28), 'bio': 'Likes'}
            #> {'name': 'Ben', 'dob': date(1990, 1, 28), 'bio': 'Likes the chain the '}
            #> {'name': 'Ben', 'dob': date(1990, 1, 28), 'bio': 'Likes the chain the dog and the pyr'}
            #> {'name': 'Ben', 'dob': date(1990, 1, 28), 'bio': 'Likes the chain the dog and the pyramid'}
            #> {'name': 'Ben', 'dob': date(1990, 1, 28), 'bio': 'Likes the chain the dog and the pyramid'}

streamed_user_profile.py

from datetime import date

from typing_extensions import NotRequired, TypedDict

from pydantic_ai import Agent


class UserProfile(TypedDict):
    name: str
    dob: NotRequired[date]
    bio: NotRequired[str]


agent = Agent(
    'openai:gpt-5.2',
    output_type=UserProfile,
    instructions='Extract a user profile from the input',
)


async def main():
    user_input = 'My name is Ben, I was born on January 28th 1990, I like the chain the dog and the pyramid.'
    async with agent.run_stream(user_input) as result:
        async for profile in result.stream_output():
            print(profile)
            #> {'name': 'Ben'}
            #> {'name': 'Ben'}
            #> {'name': 'Ben', 'dob': date(1990, 1, 28), 'bio': 'Likes'}
            #> {'name': 'Ben', 'dob': date(1990, 1, 28), 'bio': 'Likes the chain the '}
            #> {'name': 'Ben', 'dob': date(1990, 1, 28), 'bio': 'Likes the chain the dog and the pyr'}
            #> {'name': 'Ben', 'dob': date(1990, 1, 28), 'bio': 'Likes the chain the dog and the pyramid'}
            #> {'name': 'Ben', 'dob': date(1990, 1, 28), 'bio': 'Likes the chain the dog and the pyramid'}

(This example is complete, it can be run "as is" — you'll need to add asyncio.run(main()) to run main)

As setting an output_type uses the Tool Output mode by default, this will only work if the model supports streaming tool arguments. For models that don't, like Gemini, try Native Output or Prompted Output instead.

Streaming Model Responses

If you want fine-grained control of validation, you can use the following pattern to get the entire partial ModelResponse:

With Pydantic AI GatewayDirectly to Provider API

Learn about Gateway streamed_user_profile.py

from datetime import date

from pydantic import ValidationError
from typing_extensions import TypedDict

from pydantic_ai import Agent


class UserProfile(TypedDict, total=False):
    name: str
    dob: date
    bio: str


agent = Agent('gateway/openai:gpt-5.2', output_type=UserProfile)


async def main():
    user_input = 'My name is Ben, I was born on January 28th 1990, I like the chain the dog and the pyramid.'
    async with agent.run_stream(user_input) as result:
        async for message in result.stream_response(debounce_by=0.01):  # (1)!
            try:
                profile = await result.validate_response_output(  # (2)!
                    message,
                    allow_partial=message.state == 'incomplete',
                )
            except ValidationError:
                continue
            print(profile)
            #> {'name': 'Ben'}
            #> {'name': 'Ben'}
            #> {'name': 'Ben', 'dob': date(1990, 1, 28), 'bio': 'Likes'}
            #> {'name': 'Ben', 'dob': date(1990, 1, 28), 'bio': 'Likes the chain the '}
            #> {'name': 'Ben', 'dob': date(1990, 1, 28), 'bio': 'Likes the chain the dog and the pyr'}
            #> {'name': 'Ben', 'dob': date(1990, 1, 28), 'bio': 'Likes the chain the dog and the pyramid'}
            #> {'name': 'Ben', 'dob': date(1990, 1, 28), 'bio': 'Likes the chain the dog and the pyramid'}
            #> {'name': 'Ben', 'dob': date(1990, 1, 28), 'bio': 'Likes the chain the dog and the pyramid'}

stream_response streams the data as ModelResponse objects, thus iteration can't fail with a ValidationError.
validate_response_output validates the data, allow_partial=True enables pydantic's experimental_allow_partial flag on TypeAdapter.

streamed_user_profile.py

from datetime import date

from pydantic import ValidationError
from typing_extensions import TypedDict

from pydantic_ai import Agent


class UserProfile(TypedDict, total=False):
    name: str
    dob: date
    bio: str


agent = Agent('openai:gpt-5.2', output_type=UserProfile)


async def main():
    user_input = 'My name is Ben, I was born on January 28th 1990, I like the chain the dog and the pyramid.'
    async with agent.run_stream(user_input) as result:
        async for message in result.stream_response(debounce_by=0.01):  # (1)!
            try:
                profile = await result.validate_response_output(  # (2)!
                    message,
                    allow_partial=message.state == 'incomplete',
                )
            except ValidationError:
                continue
            print(profile)
            #> {'name': 'Ben'}
            #> {'name': 'Ben'}
            #> {'name': 'Ben', 'dob': date(1990, 1, 28), 'bio': 'Likes'}
            #> {'name': 'Ben', 'dob': date(1990, 1, 28), 'bio': 'Likes the chain the '}
            #> {'name': 'Ben', 'dob': date(1990, 1, 28), 'bio': 'Likes the chain the dog and the pyr'}
            #> {'name': 'Ben', 'dob': date(1990, 1, 28), 'bio': 'Likes the chain the dog and the pyramid'}
            #> {'name': 'Ben', 'dob': date(1990, 1, 28), 'bio': 'Likes the chain the dog and the pyramid'}
            #> {'name': 'Ben', 'dob': date(1990, 1, 28), 'bio': 'Likes the chain the dog and the pyramid'}

stream_response streams the data as ModelResponse objects, thus iteration can't fail with a ValidationError.
validate_response_output validates the data, allow_partial=True enables pydantic's experimental_allow_partial flag on TypeAdapter.

(This example is complete, it can be run "as is" — you'll need to add asyncio.run(main()) to run main)

Cancelling Streams

Sometimes you need to stop a streaming response before it completes: a user clicks "stop generating" in a chat UI, you've received enough data to make a decision, or you want to avoid receiving more tokens. run_stream() and iter() support explicit cancellation by closing the underlying model stream. run_stream_events() is an async context manager, so cleanup runs deterministically when you stop consuming events — leaving the async with block cancels the background run task.

Model support

The Google, xAI, and Hugging Face SDKs expose streaming only as async iterators, which limits when cancel() can interrupt an in-flight chunk read. See the Google, xAI, and Hugging Face provider docs for the recommended pattern.

Cleaning up `run_stream_events`

run_stream_events() is an async context manager that yields an async iterator over events:

With Pydantic AI GatewayDirectly to Provider API

Learn about Gateway stream_cancel_run_stream_events.py

from pydantic_ai import Agent, FinalResultEvent, PartStartEvent

agent = Agent('gateway/openai:gpt-5.2')


async def main():
    async with agent.run_stream_events('Write a long essay about Python') as events:
        async for event in events:
            if isinstance(event, PartStartEvent):
                print(f'Started: {event.part!r}')
                #> Started: TextPart(content='Python is a ')
            elif isinstance(event, FinalResultEvent):
                break  # (1)!

Breaking out of the loop leaves the async with block, which cancels the background run task and closes the HTTP connection.

stream_cancel_run_stream_events.py

from pydantic_ai import Agent, FinalResultEvent, PartStartEvent

agent = Agent('openai:gpt-5.2')


async def main():
    async with agent.run_stream_events('Write a long essay about Python') as events:
        async for event in events:
            if isinstance(event, PartStartEvent):
                print(f'Started: {event.part!r}')
                #> Started: TextPart(content='Python is a ')
            elif isinstance(event, FinalResultEvent):
                break  # (1)!

Breaking out of the loop leaves the async with block, which cancels the background run task and closes the HTTP connection.

(This example is complete, it can be run "as is" -- you'll need to add asyncio.run(main()) to run main)

run_stream_events() does not expose a cancel() method. If you need an explicit model-response cancellation handle, use run_stream() or agent.iter().

Cancelling `run_stream`

Call cancel() on the StreamedRunResult to cancel the stream:

With Pydantic AI GatewayDirectly to Provider API

Learn about Gateway stream_cancel_run_stream.py

from pydantic_ai import Agent

agent = Agent('gateway/openai:gpt-5.2')


async def main():
    async with agent.run_stream('Write a long essay about Python') as result:
        text = ''
        async for chunk in result.stream_text(delta=True):
            text += chunk
            if len(text) > 100:  # (1)!
                await result.cancel()  # (2)!
                break
        print(result.cancelled)  # (3)!
        #> True
        print(result.response.state == 'interrupted')  # (4)!
        #> True

Check a condition during streaming, for example whether enough text has been received.
cancel() tells the model provider to stop generating tokens and closes the HTTP connection when the model integration supports it.
The cancelled property reflects the cancellation state.
The final ModelResponse is marked with state='interrupted' so that downstream code can identify incomplete responses.

stream_cancel_run_stream.py

from pydantic_ai import Agent

agent = Agent('openai:gpt-5.2')


async def main():
    async with agent.run_stream('Write a long essay about Python') as result:
        text = ''
        async for chunk in result.stream_text(delta=True):
            text += chunk
            if len(text) > 100:  # (1)!
                await result.cancel()  # (2)!
                break
        print(result.cancelled)  # (3)!
        #> True
        print(result.response.state == 'interrupted')  # (4)!
        #> True

Check a condition during streaming, for example whether enough text has been received.
cancel() tells the model provider to stop generating tokens and closes the HTTP connection when the model integration supports it.
The cancelled property reflects the cancellation state.
The final ModelResponse is marked with state='interrupted' so that downstream code can identify incomplete responses.

(This example is complete, it can be run "as is" -- you'll need to add asyncio.run(main()) to run main)

If you break out of stream_text() and then leave the surrounding async with block, the stream is cleaned up as the context exits. Use cancel() when you want to stop generation immediately instead of only stopping local consumption.

Interrupted tool calls

Cancelling or breaking out of a model response stream can leave the final ModelResponse with incomplete tool-call arguments. Pydantic AI records the response with state='interrupted', but it does not filter incomplete tool calls, synthesize tool returns, or otherwise define run-resumption behavior for those partial responses. If you are controlling the graph with agent.iter(), stop the outer run loop as well, or check response.state == 'interrupted' before allowing the run to continue into tool execution.

Cancelling with `iter`

When using agent.iter() for fine-grained control over the agent graph, you can cancel the [AgentStream][pydantic_ai.result.AgentStream] inside a ModelRequestNode.stream() context:

With Pydantic AI GatewayDirectly to Provider API

Learn about Gateway stream_cancel_iter.py

from pydantic_ai import Agent, FinalResultEvent

agent = Agent('gateway/openai:gpt-5.2')


async def main():
    async with agent.iter('Write a long essay about Python') as run:
        async for node in run:
            if Agent.is_model_request_node(node):
                async with node.stream(run.ctx) as stream:
                    async for event in stream:
                        if isinstance(event, FinalResultEvent):
                            await stream.cancel()  # (1)!
                            break

AgentStream.cancel() cancels the stream at the model request level.

stream_cancel_iter.py

from pydantic_ai import Agent, FinalResultEvent

agent = Agent('openai:gpt-5.2')


async def main():
    async with agent.iter('Write a long essay about Python') as run:
        async for node in run:
            if Agent.is_model_request_node(node):
                async with node.stream(run.ctx) as stream:
                    async for event in stream:
                        if isinstance(event, FinalResultEvent):
                            await stream.cancel()  # (1)!
                            break

AgentStream.cancel() cancels the stream at the model request level.

(This example is complete, it can be run "as is" -- you'll need to add asyncio.run(main()) to run main)

Message History After Cancellation

When a stream is cancelled, the response is recorded with state='interrupted' in the message history. The history includes any partial content that was received before cancellation:

With Pydantic AI GatewayDirectly to Provider API

Learn about Gateway stream_cancel_history.py

from pydantic_ai import Agent

agent = Agent('gateway/openai:gpt-5.2')


async def main():
    async with agent.run_stream('Tell me about Python') as result:
        async for text in result.stream_text(delta=True):
            break
        await result.cancel()

    messages = result.all_messages()  # (1)!
    print(messages[-1].state)  # (2)!
    #> interrupted

The message history includes the interrupted response with any partial content that was received before cancellation.
The interrupted response state lets your application decide whether to keep, inspect, or discard the partial response before reusing the history.

stream_cancel_history.py

from pydantic_ai import Agent

agent = Agent('openai:gpt-5.2')


async def main():
    async with agent.run_stream('Tell me about Python') as result:
        async for text in result.stream_text(delta=True):
            break
        await result.cancel()

    messages = result.all_messages()  # (1)!
    print(messages[-1].state)  # (2)!
    #> interrupted

The message history includes the interrupted response with any partial content that was received before cancellation.
The interrupted response state lets your application decide whether to keep, inspect, or discard the partial response before reusing the history.

(This example is complete, it can be run "as is" -- you'll need to add asyncio.run(main()) to run main)

Reusing interrupted history

Pydantic AI does not clean up incomplete tool calls in interrupted responses. Passing interrupted history directly into another run can therefore fail or lead to retries if the model was in the middle of emitting a tool call when cancellation happened. For now, applications that reuse interrupted history should inspect state='interrupted' responses and apply their own policy.

Usage tracking for cancelled streams

Token usage reported by usage() after cancellation is partial and provider-dependent. Pydantic AI stops pulling from the stream immediately, so final usage events may never arrive; some provider SDKs may also continue generation server-side after the local stream is closed. Do not rely on cancelled-stream usage for cost-critical accounting. For OpenAI chat completions, openai_continuous_usage_stats can improve in-stream usage reporting by requesting cumulative usage data with each chunk, but cancelled-stream usage is still best-effort.

Examples

The following examples demonstrate how to use streamed responses in Pydantic AI:

Output

Structured output data

Output functions

Text output

Handling partial output in output functions

Output modes

Tool Output

Parallel Output Tool Calls

Retrying after a tool failure

Controlling output tool parallelism

Native Output

Prompted Output

Custom JSON schema

Validation context

Output validators

Handling partial output in output validators

Image output

Optional output (allowing None)

Streamed Results

Streaming Text

Streaming Structured Output

Streaming Model Responses

Cancelling Streams

Cleaning up run_stream_events

Cancelling run_stream

Cancelling with iter

Message History After Cancellation

Examples

Optional output (allowing `None`)

Cleaning up `run_stream_events`

Cancelling `run_stream`

Cancelling with `iter`