Note
Access to this page requires authorization. You can try signing in or changing directories.
Access to this page requires authorization. You can try changing directories.
This article lists a selection of Microsoft Foundry Models sold directly by Azure in Azure Government along with their capabilities and deployment types, and regions of availability.
Models sold directly by Azure include all Azure OpenAI models offered in Azure Government. These models are billed through your Azure subscription, covered by Azure service-level agreements, and supported by Microsoft.
To learn more about attributes of all Foundry Models sold directly by Azure across all clouds, see Models Sold Directly by Azure.
Azure OpenAI in Microsoft Foundry models in Azure Government
Azure OpenAI is powered by a diverse set of models with different capabilities and price points. Model availability in Azure Government varies by region.
| Models | Description |
|---|---|
| GPT-5.1 series | NEW gpt-5.1 |
| GPT-4.1 series | gpt-4.1, gpt-4.1-mini |
| o-series models | Reasoning models with advanced problem solving and increased focus and capability. |
| GPT-4o | Capable Azure OpenAI models with multimodal versions, which can accept both text and images as input. |
| Embeddings | A set of models that can convert text into numerical vector form to facilitate text similarity. |
GPT-5.1
Region availability
| Model | Region |
|---|---|
gpt-5.1 |
See the models table. |
| Model ID | Description | Context Window | Max Output Tokens | Training Data (up to) |
|---|---|---|---|---|
gpt-5.1 (2025-11-13) |
- Reasoning - Chat Completions API. - Responses API. - Structured outputs. - Text and image processing. - Functions, tools, and parallel tool calling. - Full summary of capabilities. |
400,000 Input: 272,000 Output: 128,000 |
128,000 | September 30, 2024 |
Important
gpt-5.1reasoning_effortdefaults tonone. When upgrading from previous reasoning models togpt-5.1, keep in mind that you may need to update your code to explicitly pass areasoning_effortlevel if you want reasoning to occur.
GPT-4.1 series
Region availability
| Model | Region |
|---|---|
gpt-4.1 (2025-04-14) |
See the models table. |
gpt-4.1-mini (2025-04-14) |
See the models table. |
Capabilities
Important
A known issue is affecting all GPT 4.1 series models. Large tool or function call definitions that exceed 300,000 tokens will result in failures, even though the 1 million token context limit of the models wasn't reached.
The errors can vary based on API call and underlying payload characteristics.
Here are the error messages for the Chat Completions API:
Error code: 400 - {'error': {'message': "This model's maximum context length is 300000 tokens. However, your messages resulted in 350564 tokens (100 in the messages, 350464 in the functions). Please reduce the length of the messages or functions.", 'type': 'invalid_request_error', 'param': 'messages', 'code': 'context_length_exceeded'}}Error code: 400 - {'error': {'message': "Invalid 'tools[0].function.description': string too long. Expected a string with maximum length 1048576, but got a string with length 2778531 instead.", 'type': 'invalid_request_error', 'param': 'tools[0].function.description', 'code': 'string_above_max_length'}}
Here's the error message for the Responses API:
Error code: 500 - {'error': {'message': 'The server had an error processing your request. Sorry about that! You can retry your request, or contact us through an Azure support request at: https://go.microsoft.com/fwlink/?linkid=2213926 if you keep seeing this error. (Please include the request ID d2008353-291d-428f-adc1-defb5d9fb109 in your email.)', 'type': 'server_error', 'param': None, 'code': None}}
| Model ID | Description | Context window | Max output tokens | Training data (up to) |
|---|---|---|---|---|
gpt-4.1 (2025-04-14) |
- Text and image input - Text output - Chat completions API - Responses API - Streaming - Function calling - Structured outputs (chat completions) |
- 1,047,576 (not available) - 128,000 (standard & provisioned managed deployments) - 300,000 (batch deployments) |
32,768 | May 31, 2024 |
gpt-4.1-mini (2025-04-14) |
- Text and image input - Text output - Chat completions API - Responses API - Streaming - Function calling - Structured outputs (chat completions) |
- 1,047,576 (not available) - 128,000 (standard & provisioned managed deployments) - 300,000 (batch deployments) |
32,768 | May 31, 2024 |
o-series models
The Azure OpenAI o-series models are designed to tackle reasoning and problem-solving tasks with increased focus and capability. These models spend more time processing and understanding the user's request, making them exceptionally strong in areas like science, coding, and math, compared to previous iterations.
| Model ID | Description | Max request (tokens) | Training data (up to) |
|---|---|---|---|
o3-mini (2025-01-31) |
- Enhanced reasoning abilities. - Structured outputs. - Text-only processing. - Functions and tools. |
Input: 200,000 Output: 100,000 |
October 2023 |
To learn more about advanced o-series models, see Getting started with reasoning models.
Region availability
| Model | Region |
|---|---|
o3-mini |
See the models table. |
GPT-4o
GPT-4o integrates text and images in a single model, which enables it to handle multiple data types simultaneously. This multimodal approach enhances accuracy and responsiveness in human-computer interactions. GPT-4o matches GPT-4 Turbo in English text and coding tasks while offering superior performance in non-English language tasks and vision tasks, setting new benchmarks for AI capabilities.
| Model ID | Description | Max request (tokens) | Training data (up to) |
|---|---|---|---|
gpt-4o (2024-11-20) GPT-4o (Omni) |
- Structured outputs. - Text and image processing. - JSON Mode. - Parallel function calling. - Enhanced accuracy and responsiveness. - Parity with English text and coding tasks compared to GPT-4 Turbo with Vision. - Superior performance in non-English languages and in vision tasks. - Enhanced creative writing ability. |
Input: 128,000 Output: 16,384 |
October 2023 |
Embeddings
text-embedding-3-large is the latest and most capable embedding model. You can't upgrade between embeddings models. To move from using text-embedding-ada-002 to text-embedding-3-large, you need to generate new embeddings.
text-embedding-3-largetext-embedding-3-smalltext-embedding-ada-002
OpenAI reports that testing shows that both the large and small third generation embeddings models offer better average multi-language retrieval performance with the MIRACL benchmark. They still maintain performance for English tasks with the MTEB benchmark.
| Evaluation benchmark | text-embedding-ada-002 |
text-embedding-3-small |
text-embedding-3-large |
|---|---|---|---|
| MIRACL average | 31.4 | 44.0 | 54.9 |
| MTEB average | 61.0 | 62.3 | 64.6 |
The third generation embeddings models support reducing the size of the embedding via a new dimensions parameter. Typically, larger embeddings are more expensive from a compute, memory, and storage perspective. When you can adjust the number of dimensions, you gain more control over overall cost and performance. The dimensions parameter isn't supported in all versions of the OpenAI 1.x Python library. To take advantage of this parameter, we recommend that you upgrade to the latest version: pip install openai --upgrade.
OpenAI's MTEB benchmark testing found that even when the third generation model's dimensions are reduced to less than the 1,536 dimensions of text-embeddings-ada-002, performance remains slightly better.
Model summary table and region availability
Models by deployment type
Azure OpenAI provides customers with choices on the hosting structure that fits their business and usage patterns. The service offers two main types of deployment:
- Standard: Has a USGov datazone deployment option, routing traffic within Azure Government to provide higher throughput.
- Provisioned: Also has a datazone deployment option, allowing customers to purchase and deploy provisioned throughput units across Azure Government infrastructure.
All deployments can perform the exact same inference operations, but the billing, scale, and performance are substantially different. To learn more about Azure OpenAI deployment types, see our Deployment types guide.
Data Zone Standard model availability
| Region | gpt-5.1, 2025-11-13 | gpt-4.1, 2025-04-14 | gpt-4.1-mini, 2025-04-14 | o3-mini, 2025-01-31 | gpt-4o, 2024-11-20 | text-embedding-ada-002, 2 | text-embedding-3-large, 1 | text-embedding-3-small, 1 |
|---|---|---|---|---|---|---|---|---|
| usgovarizona | ✅ | ✅ | ✅ | ✅ | ✅ | - | - | - |
| usgovvirginia | ✅ | ✅ | ✅ | ✅ | ✅ | - | - | - |
Embeddings models
These models can be used only with Embedding API requests.
Note
text-embedding-3-large is the latest and most capable embedding model. You can't upgrade between embedding models. To migrate from using text-embedding-ada-002 to text-embedding-3-large, you need to generate new embeddings.
| Model ID | Max request (tokens) | Output dimensions | Training data (up to) |
|---|---|---|---|
text-embedding-ada-002 (version 2) |
8,192 | 1,536 | Sep 2021 |
text-embedding-ada-002 (version 1) |
2,046 | 1,536 | Sep 2021 |
text-embedding-3-large |
8,192 | 3,072 | Sep 2021 |
text-embedding-3-small |
8,192 | 1,536 | Sep 2021 |
Note
When you send an array of inputs for embedding, the maximum number of input items in the array per call to the embedding endpoint is 2,048.
Model retirement
In some cases, models are retired in Azure Government earlier or later than in the commercial cloud. For the latest information on model retirements, refer to the model retirement guide.
| Model | Version | Public Retirement Plan | Azure Government Plan |
|---|---|---|---|
| gpt-4o | 0513 | 3/31/2026 for Regional Standard. 10/1/2026 for DataZone and PTU. | 3/31/2026 for all Deployment Types. |
| gpt-4o-mini | 0718 | 3/31/2026 for Regional Standard. 10/1/2026 for DataZone and PTU. | 3/31/2026 for all Deployment Types. |