The GoogleModel is a model that uses the google-genai package under the hood to
access Google's Gemini models via both the Gemini API and Google Cloud (formerly known as Vertex AI).
Two providers wrap those endpoints:
GoogleProvider— the Gemini API (Google AI Studio), surfaced under the'google:'prefix.- [
GoogleCloudProvider][pydantic_ai.providers.google_cloud.GoogleCloudProvider] — Google Cloud (formerly known as Vertex AI), surfaced under the'google-cloud:'prefix.
Install
To use GoogleModel, you need to either install pydantic-ai, or install pydantic-ai-slim with the google optional group:
pip install "pydantic-ai-slim[google]"
uv add "pydantic-ai-slim[google]"
Configuration
GoogleModel lets you use Google's Gemini models through their Gemini API (generativelanguage.googleapis.com) or Google Cloud (*-aiplatform.googleapis.com, formerly known as Vertex AI).
API Key (Gemini API)
To use Gemini via the Gemini API, go to aistudio.google.com and create an API key.
Once you have the API key, set it as an environment variable:
export GOOGLE_API_KEY=your-api-key
You can then use GoogleModel by name:
from pydantic_ai import Agent
agent = Agent('gateway/google:gemini-3-pro-preview')
...
from pydantic_ai import Agent
agent = Agent('google:gemini-3-pro-preview')
...
Or you can explicitly create the provider:
from pydantic_ai import Agent
from pydantic_ai.models.google import GoogleModel
from pydantic_ai.providers.google import GoogleProvider
provider = GoogleProvider(api_key='your-api-key')
model = GoogleModel('gemini-3-pro-preview', provider=provider)
agent = Agent(model)
...
Google Cloud (Enterprise)
If you are an enterprise user, you can also use GoogleModel to access Gemini via Google Cloud (formerly known as Vertex AI).
This interface has a number of advantages over the Gemini API:
- The Google Cloud API comes with more enterprise readiness guarantees.
- You can purchase provisioned throughput with Google Cloud to guarantee capacity.
- If you're running Pydantic AI inside Google Cloud, you don't need to set up authentication, it should "just work".
- You can decide which region to use, which might be important from a regulatory perspective, and might improve latency.
You can authenticate using application default credentials, a service account, or an API key.
Whichever way you authenticate, you'll need to have the Vertex AI API (now branded as Google Cloud AI) enabled in your Google Cloud account.
Application Default Credentials
If you have the gcloud CLI installed and configured, you can use the GoogleCloudProvider by name:
from pydantic_ai import Agent
agent = Agent('gateway/google-cloud:gemini-3-pro-preview')
...
from pydantic_ai import Agent
agent = Agent('google-cloud:gemini-3-pro-preview')
...
Or you can explicitly create the provider and model:
from pydantic_ai import Agent
from pydantic_ai.models.google import GoogleModel
from pydantic_ai.providers.google_cloud import GoogleCloudProvider
provider = GoogleCloudProvider()
model = GoogleModel('gemini-3-pro-preview', provider=provider)
agent = Agent(model)
...
Service Account
To use a service account JSON file, explicitly create the provider and model:
from google.oauth2 import service_account
from pydantic_ai import Agent
from pydantic_ai.models.google import GoogleModel
from pydantic_ai.providers.google_cloud import GoogleCloudProvider
credentials = service_account.Credentials.from_service_account_file(
'path/to/service-account.json',
scopes=['https://www.googleapis.com/auth/cloud-platform'],
)
provider = GoogleCloudProvider(credentials=credentials, project='your-project-id')
model = GoogleModel('gemini-3-flash-preview', provider=provider)
agent = Agent(model)
...
API Key
To use Google Cloud with an API key, create a key and set it as an environment variable:
export GOOGLE_API_KEY=your-api-key
You can then use GoogleModel via the GoogleCloudProvider by name:
from pydantic_ai import Agent
agent = Agent('gateway/google-cloud:gemini-3-pro-preview')
...
from pydantic_ai import Agent
agent = Agent('google-cloud:gemini-3-pro-preview')
...
Or you can explicitly create the provider and model:
from pydantic_ai import Agent
from pydantic_ai.models.google import GoogleModel
from pydantic_ai.providers.google_cloud import GoogleCloudProvider
provider = GoogleCloudProvider(api_key='your-api-key')
model = GoogleModel('gemini-3-pro-preview', provider=provider)
agent = Agent(model)
...
Customizing Location or Project
You can specify the location and/or project when using Google Cloud:
from pydantic_ai import Agent
from pydantic_ai.models.google import GoogleModel
from pydantic_ai.providers.google_cloud import GoogleCloudProvider
provider = GoogleCloudProvider(location='asia-east1', project='your-google-cloud-project-id')
model = GoogleModel('gemini-3-pro-preview', provider=provider)
agent = Agent(model)
...
Service tier (service_tier, google_cloud_service_tier)
The unified service_tier field works on both Google subsystems, with google_cloud_service_tier available for finer Google Cloud routing control. The provider-specific field wins when both are set.
Gemini API — sent as the request's service_tier field:
service_tier |
Sent to Gemini API |
|---|---|
'auto' |
(omitted — server default) |
'default' |
'standard' |
'flex' |
'flex' |
'priority' |
'priority' |
Google Cloud — sent as HTTP routing headers; 'flex' and 'priority' always pick the PT-with-spillover variant, so customers with Provisioned Throughput (PT) keep using their reserved capacity first:
service_tier |
Google Cloud routing headers | Effective behavior |
|---|---|---|
'auto' / 'default' |
(none) | PT first, then standard on-demand spillover |
'flex' |
X-Vertex-AI-LLM-Shared-Request-Type: flex |
PT first, then Flex PayGo spillover |
'priority' |
X-Vertex-AI-LLM-Shared-Request-Type: priority |
PT first, then Priority PayGo spillover |
To bypass PT entirely (or use it exclusively, or any of the other Google Cloud-specific routing combinations) set google_cloud_service_tier directly — the unified field is intentionally limited to the safe PT-with-spillover variants.
Google Cloud — full set of routing values
The full google_cloud_service_tier values map to these HTTP headers:
'pt_only': PT only (X-Vertex-AI-LLM-Request-Type: dedicated).'pt_then_flex': PT when quota allows, then Flex PayGo spillover (X-Vertex-AI-LLM-Shared-Request-Type: flex).'pt_then_priority': PT when quota allows, then Priority PayGo spillover (X-Vertex-AI-LLM-Shared-Request-Type: priority).'on_demand': Standard on-demand only (X-Vertex-AI-LLM-Request-Type: shared).'flex_only': Flex PayGo only (X-Vertex-AI-LLM-Request-Type: sharedandX-Vertex-AI-LLM-Shared-Request-Type: flex).'priority_only': Priority PayGo only (X-Vertex-AI-LLM-Request-Type: sharedandX-Vertex-AI-LLM-Shared-Request-Type: priority).
Example
from pydantic_ai import Agent
from pydantic_ai.models.google import GoogleModel, GoogleModelSettings
from pydantic_ai.providers.google_cloud import GoogleCloudProvider
provider = GoogleCloudProvider(location='global')
model = GoogleModel('gemini-3-flash-preview', provider=provider)
agent = Agent(model)
result = agent.run_sync(
'Hello!',
model_settings=GoogleModelSettings(google_cloud_service_tier='pt_then_flex'),
)
Swap 'pt_then_flex' for any GoogleCloudServiceTier value — e.g. 'pt_then_priority' for Priority PayGo spillover, or 'flex_only' / 'priority_only' to bypass PT entirely.
After the request, inspect ModelResponse provider_details.get('traffic_type') (e.g. ON_DEMAND_FLEX, ON_DEMAND_PRIORITY) to see which tier served it, when the API returns it.
Model Garden
You can access models from the Model Garden that support the generateContent API and are available under your Google Cloud project, including but not limited to Gemini, using one of the following model_name patterns:
{model_id}for Gemini models{publisher}/{model_id}publishers/{publisher}/models/{model_id}projects/{project}/locations/{location}/publishers/{publisher}/models/{model_id}
from pydantic_ai import Agent
from pydantic_ai.models.google import GoogleModel
from pydantic_ai.providers.google_cloud import GoogleCloudProvider
provider = GoogleCloudProvider(
project='your-google-cloud-project-id',
location='us-central1', # the region where the model is available
)
model = GoogleModel('meta/llama-3.3-70b-instruct-maas', provider=provider)
agent = Agent(model)
...
Custom HTTP Client
You can customize the GoogleProvider with a custom httpx.AsyncClient:
from httpx import AsyncClient
from pydantic_ai import Agent
from pydantic_ai.models.google import GoogleModel
from pydantic_ai.providers.google import GoogleProvider
custom_http_client = AsyncClient(timeout=30)
model = GoogleModel(
'gemini-3-pro-preview',
provider=GoogleProvider(api_key='your-api-key', http_client=custom_http_client),
)
agent = Agent(model)
...
Document, Image, Audio, and Video Input
GoogleModel supports multi-modal input, including documents, images, audio, and video.
YouTube video URLs can be passed directly to Google models:
from pydantic_ai import Agent, VideoUrl
from pydantic_ai.models.google import GoogleModel
agent = Agent(GoogleModel('gemini-3-flash-preview'))
result = agent.run_sync(
[
'What is this video about?',
VideoUrl(url='https://www.youtube.com/watch?v=dQw4w9WgXcQ'),
]
)
print(result.output)
Files can be uploaded via the Files API and passed as URLs:
from pydantic_ai import Agent, DocumentUrl
from pydantic_ai.models.google import GoogleModel
from pydantic_ai.providers.google import GoogleProvider
provider = GoogleProvider()
file = provider.client.files.upload(file='pydantic-ai-logo.png')
assert file.uri is not None
agent = Agent(GoogleModel('gemini-3-flash-preview', provider=provider))
result = agent.run_sync(
[
'What company is this logo from?',
DocumentUrl(url=file.uri, media_type=file.mime_type),
]
)
print(result.output)
See the input documentation for more details and examples.
Model settings
You can customize model behavior using GoogleModelSettings:
from google.genai.types import HarmBlockThreshold, HarmCategory
from pydantic_ai import Agent
from pydantic_ai.models.google import GoogleModel, GoogleModelSettings
settings = GoogleModelSettings(
temperature=0.2,
max_tokens=1024,
google_safety_settings=[
{
'category': HarmCategory.HARM_CATEGORY_HATE_SPEECH,
'threshold': HarmBlockThreshold.BLOCK_LOW_AND_ABOVE,
}
]
)
model = GoogleModel('gemini-3-pro-preview')
agent = Agent(model, model_settings=settings)
...
Configure thinking
Use the provider-agnostic Thinking capability to enable thinking:
from pydantic_ai import Agent
from pydantic_ai.capabilities import Thinking
agent = Agent('gateway/google:gemini-3.5-flash', capabilities=[Thinking(effort='medium')])
...
from pydantic_ai import Agent
from pydantic_ai.capabilities import Thinking
agent = Agent('google:gemini-3.5-flash', capabilities=[Thinking(effort='medium')])
...
For advanced usage, you can pass Google's native thinking config through GoogleModelSettings.google_thinking_config:
from pydantic_ai import Agent
from pydantic_ai.models.google import GoogleModel, GoogleModelSettings
model = GoogleModel('gemini-3.5-flash')
model_settings = GoogleModelSettings(google_thinking_config={'include_thoughts': True, 'thinking_level': 'MEDIUM'})
agent = Agent(model, model_settings=model_settings)
...
See Thinking for the unified API and Gemini API docs for Google's native thinking configuration.
Safety settings
You can customize the safety settings by setting the google_safety_settings field.
from google.genai.types import HarmBlockThreshold, HarmCategory
from pydantic_ai import Agent
from pydantic_ai.models.google import GoogleModel, GoogleModelSettings
model_settings = GoogleModelSettings(
google_safety_settings=[
{
'category': HarmCategory.HARM_CATEGORY_HATE_SPEECH,
'threshold': HarmBlockThreshold.BLOCK_LOW_AND_ABOVE,
}
]
)
model = GoogleModel('gemini-3-flash-preview')
agent = Agent(model, model_settings=model_settings)
...
See the Gemini API docs for more on safety settings.
Logprobs
You can return logprobs from the model in your response by setting google_logprobs and google_top_logprobs in the GoogleModelSettings.
This feature is only supported for non-streaming requests and Google Cloud.
from pydantic_ai import Agent
from pydantic_ai.models.google import GoogleModel, GoogleModelSettings
from pydantic_ai.providers.google_cloud import GoogleCloudProvider
model_settings = GoogleModelSettings(
google_logprobs=True, google_top_logprobs=2,
)
model = GoogleModel(
model_name='gemini-2.5-flash',
provider=GoogleCloudProvider(location='europe-west1'),
)
agent = Agent(model, model_settings=model_settings)
result = agent.run_sync('Your prompt here')
# Access logprobs from provider_details
logprobs = result.response.provider_details.get('logprobs')
avg_logprobs = result.response.provider_details.get('avg_logprobs')
See the Google Dev Blog for more information.
Streaming cancellation
Cancellation limitations
The google-genai SDK exposes streaming responses only as an async iterator, with no separate handle for closing the underlying HTTP transport. Because of a Python language rule on async generators, cancel() cannot interrupt an in-flight chunk read while another coroutine is iterating the stream. Pydantic AI marks the response with state='interrupted', but upstream generation may continue until the surrounding async with agent.run_stream(...) block exits.
For reliable cancellation, either pass debounce_by=None to stream_text(), stream_output(), or stream_response() and call cancel() from the same task that's iterating:
from pydantic_ai import Agent
agent = Agent('gateway/google:gemini-3-pro-preview')
def should_stop(chunk: str) -> bool:
return len(chunk) > 100
async def main():
async with agent.run_stream('Write a long essay about Python') as result:
async for chunk in result.stream_text(debounce_by=None):
if should_stop(chunk):
await result.cancel()
break
from pydantic_ai import Agent
agent = Agent('google:gemini-3-pro-preview')
def should_stop(chunk: str) -> bool:
return len(chunk) > 100
async def main():
async with agent.run_stream('Write a long essay about Python') as result:
async for chunk in result.stream_text(debounce_by=None):
if should_stop(chunk):
await result.cancel()
break
Or, if you need to keep debouncing, wrap the stream with contextlib.aclosing so the iterator is closed before cancel() runs:
from contextlib import aclosing
from pydantic_ai import Agent
agent = Agent('gateway/google:gemini-3-pro-preview')
def should_stop(chunk: str) -> bool:
return len(chunk) > 100
async def main():
async with agent.run_stream('Write a long essay about Python') as result:
async with aclosing(result.stream_text()) as stream:
async for chunk in stream:
if should_stop(chunk):
break
await result.cancel()
from contextlib import aclosing
from pydantic_ai import Agent
agent = Agent('google:gemini-3-pro-preview')
def should_stop(chunk: str) -> bool:
return len(chunk) > 100
async def main():
async with agent.run_stream('Write a long essay about Python') as result:
async with aclosing(result.stream_text()) as stream:
async for chunk in stream:
if should_stop(chunk):
break
await result.cancel()
Calling cancel() from a different task while iteration is in progress is not currently reliable on this provider.