Azure OpenAI Service models
Azure OpenAI Service is powered by a diverse set of models with different capabilities and price points. Model availability varies by region. For GPT-3 and other models retiring in July 2024, see Azure OpenAI Service legacy models.
Models | Description |
---|---|
GPT-4 | A set of models that improve on GPT-3.5 and can understand and generate natural language and code. |
GPT-3.5 | A set of models that improve on GPT-3 and can understand and generate natural language and code. |
Embeddings | A set of models that can convert text into numerical vector form to facilitate text similarity. |
DALL-E | A series of models that can generate original images from natural language. |
Whisper | A series of models in preview that can transcribe and translate speech to text. |
Text to speech (Preview) | A series of models in preview that can synthesize text to speech. |
GPT-4 and GPT-4 Turbo Preview
GPT-4 is a large multimodal model (accepting text or image inputs and generating text) that can solve difficult problems with greater accuracy than any of OpenAI's previous models. Like GPT-3.5 Turbo, GPT-4 is optimized for chat and works well for traditional completions tasks. Use the Chat Completions API to use GPT-4. To learn more about how to interact with GPT-4 and the Chat Completions API check out our in-depth how-to.
GPT-4 Turbo with Vision is the version of GPT-4 that accepts image inputs. It is available as the vision-preview
model of gpt-4
.
gpt-4
gpt-4-32k
You can see the token context length supported by each model in the model summary table.
GPT-3.5
GPT-3.5 models can understand and generate natural language or code. The most capable and cost effective model in the GPT-3.5 family is GPT-3.5 Turbo, which has been optimized for chat and works well for traditional completions tasks as well. GPT-3.5 Turbo is available for use with the Chat Completions API. GPT-3.5 Turbo Instruct has similar capabilities to text-davinci-003
using the Completions API instead of the Chat Completions API. We recommend using GPT-3.5 Turbo and GPT-3.5 Turbo Instruct over legacy GPT-3.5 and GPT-3 models.
gpt-35-turbo
gpt-35-turbo-16k
gpt-35-turbo-instruct
You can see the token context length supported by each model in the model summary table.
To learn more about how to interact with GPT-3.5 Turbo and the Chat Completions API check out our in-depth how-to.
Embeddings
text-embedding-3-large
is the latest and most capable embedding model. Upgrading between embeddings models is not possible. In order to move from using text-embedding-ada-002
to text-embedding-3-large
you would need to generate new embeddings.
text-embedding-3-large
text-embedding-3-small
text-embedding-ada-002
In testing, OpenAI reports both the large and small third generation embeddings models offer better average multi-language retrieval performance with the MIRACL benchmark while still maintaining performance for English tasks with the MTEB benchmark.
Evaluation Benchmark | text-embedding-ada-002 |
text-embedding-3-small |
text-embedding-3-large |
---|---|---|---|
MIRACL average | 31.4 | 44.0 | 54.9 |
MTEB average | 61.0 | 62.3 | 64.6 |
The third generation embeddings models support reducing the size of the embedding via a new dimensions
parameter. Typically larger embeddings are more expensive from a compute, memory, and storage perspective. Being able to adjust the number of dimensions allows more control over overall cost and performance. The dimensions
parameter is not supported in all versions of the OpenAI 1.x Python library, to take advantage of this parameter we recommend upgrading to the latest version: pip install openai --upgrade
.
OpenAI's MTEB benchmark testing found that even when the third generation model's dimensions are reduced to less than text-embeddings-ada-002
1,536 dimensions performance remains slightly better.
DALL-E
The DALL-E models generate images from text prompts that the user provides. DALL-E 3 is generally available for use with the REST APIs. DALL-E 2 and DALL-E 3 with client SDKs are in preview.
Whisper
The Whisper models can be used for speech to text.
You can also use the Whisper model via Azure AI Speech batch transcription API. Check out What is the Whisper model? to learn more about when to use Azure AI Speech vs. Azure OpenAI Service.
Text to speech (Preview)
The OpenAI text to speech models, currently in preview, can be used to synthesize text to speech.
You can also use the OpenAI text to speech voices via Azure AI Speech. To learn more, see OpenAI text to speech voices via Azure OpenAI Service or via Azure AI Speech guide.
Model summary table and region availability
Note
This article only covers model/region availability that applies to all Azure OpenAI customers with deployment types of Standard. Some select customers have access to model/region combinations that are not listed in the unified table below. These tables also do not apply to customers using only Provisioned deployment types which have their own unique model/region availability matrix. For more information on Provisioned deployments refer to our Provisioned guidance.
Standard deployment model availability
Region |
gpt-4 , 0613 |
gpt-4 , 1106-Preview |
gpt-4 , 0125-Preview |
gpt-4 , vision-preview |
gpt-4-32k , 0613 |
gpt-35-turbo , 0301 |
gpt-35-turbo , 0613 |
gpt-35-turbo , 1106 |
gpt-35-turbo , 0125 |
gpt-35-turbo-16k , 0613 |
gpt-35-turbo-instruct , 0914 |
text-embedding-ada-002 , 1 |
text-embedding-ada-002 , 2 |
text-embedding-3-small , 1 |
text-embedding-3-large , 1 |
babbage-002 , 1 |
dall-e-3 , 3.0 |
davinci-002 , 1 |
tts , 001 |
tts-hd , 001 |
whisper , 001 |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
australiaeast | ✅ | ✅ | - | ✅ | ✅ | - | ✅ | ✅ | - | ✅ | - | - | ✅ | - | - | - | ✅ | - | - | - | - |
brazilsouth | - | - | - | - | - | - | - | - | - | - | - | - | ✅ | - | - | - | - | - | - | - | - |
canadaeast | ✅ | ✅ | - | - | ✅ | - | ✅ | ✅ | ✅ | ✅ | - | - | ✅ | ✅ | ✅ | - | - | - | - | - | - |
eastus | - | - | ✅ | - | - | ✅ | ✅ | - | - | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | - | ✅ | - | - | - | - |
eastus2 | - | ✅ | - | - | - | - | ✅ | - | - | ✅ | - | - | ✅ | ✅ | ✅ | - | - | - | - | - | ✅ |
francecentral | ✅ | ✅ | - | - | ✅ | ✅ | ✅ | ✅ | - | ✅ | - | - | ✅ | - | - | - | - | - | - | - | - |
japaneast | - | - | - | ✅ | - | - | ✅ | - | - | ✅ | - | - | ✅ | - | - | - | - | - | - | - | - |
northcentralus | - | - | ✅ | - | - | - | ✅ | - | ✅ | ✅ | - | - | ✅ | - | - | ✅ | - | ✅ | ✅ | ✅ | ✅ |
norwayeast | - | ✅ | - | - | - | - | - | - | - | - | - | - | ✅ | - | - | - | - | - | - | - | ✅ |
southafricanorth | - | - | - | - | - | - | - | - | - | - | - | - | ✅ | - | - | - | - | - | - | - | - |
southcentralus | - | - | ✅ | - | - | ✅ | - | - | ✅ | - | - | ✅ | ✅ | - | - | - | - | - | - | - | - |
southindia | - | ✅ | - | - | - | - | - | ✅ | - | - | - | - | ✅ | - | - | - | - | - | - | - | ✅ |
swedencentral | ✅ | ✅ | - | ✅ | ✅ | - | ✅ | ✅ | - | ✅ | ✅ | - | ✅ | - | - | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ |
switzerlandnorth | ✅ | - | - | ✅ | ✅ | - | ✅ | - | - | ✅ | - | - | ✅ | - | - | - | - | - | - | - | - |
uksouth | - | ✅ | - | - | - | ✅ | ✅ | ✅ | - | ✅ | - | - | ✅ | - | - | - | - | - | - | - | - |
westeurope | - | - | - | - | - | ✅ | - | - | - | - | - | - | ✅ | - | - | - | - | - | - | - | ✅ |
westus | - | ✅ | - | ✅ | - | - | - | ✅ | - | - | - | - | ✅ | - | - | - | - | - | - | - | - |
westus3 | - | - | - | - | - | - | - | - | - | - | - | - | ✅ | - | - | - | - | - | - | - | - |
This table does not include fine-tuning regional availability, consult the dedicated fine-tuning section for this information.
Standard deployment model quota
The default quota for models varies by model and region. Default quota limits are subject to change.
Quota for standard deployments is described in of terms of Tokens-Per-Minute (TPM).
Region | GPT-4 | GPT-4-32K | GPT-4-Turbo | GPT-4-Turbo-V | GPT-35-Turbo | GPT-35-Turbo-Instruct | Text-Embedding-Ada-002 | text-embedding-3-small | text-embedding-3-large | Babbage-002 | Babbage-002 - finetune | Davinci-002 | Davinci-002 - finetune | GPT-35-Turbo - finetune | GPT-35-Turbo-1106 - finetune | GPT-35-Turbo-0125 - finetune |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
australiaeast | 40 K | 80 K | 80 K | 30 K | 300 K | - | 350 K | - | - | - | - | - | - | - | - | - |
brazilsouth | - | - | - | - | - | - | 350 K | - | - | - | - | - | - | - | - | - |
canadaeast | 40 K | 80 K | 80 K | - | 300 K | - | 350 K | 350 K | 350 K | - | - | - | - | - | - | - |
eastus | - | - | 80 K | - | 240 K | 240 K | 240 K | 350 K | 350 K | - | - | - | - | - | - | - |
eastus2 | - | 80 K | 80 K | - | 300 K | - | 350 K | 350 K | 350 K | - | - | - | - | 250 K | 250 K | 250 K |
francecentral | 20 K | 60 K | 80 K | - | 240 K | - | 240 K | - | - | - | - | - | - | - | - | - |
japaneast | - | - | - | 30 K | 300 K | - | 350 K | - | - | - | - | - | - | - | - | - |
northcentralus | - | - | 80 K | - | 300 K | - | 350 K | - | - | 240 K | 250 K | 240 K | 250 K | 250 K | 250 K | 250 K |
norwayeast | - | - | 150 K | - | - | - | 350 K | - | - | - | - | - | - | - | - | - |
southafricanorth | - | - | - | - | - | - | 350 K | - | - | - | - | - | - | - | - | - |
southcentralus | - | - | 80 K | - | 240 K | - | 240 K | - | - | - | - | - | - | - | - | - |
southindia | - | - | 150 K | - | 300 K | - | 350 K | - | - | - | - | - | - | - | - | - |
swedencentral | 40 K | 80 K | 150 K | 30 K | 300 K | 240 K | 350 K | - | - | 240 K | 250 K | 240 K | 250 K | 250 K | 250 K | 250 K |
switzerlandnorth | 40 K | 80 K | - | 30 K | 300 K | - | 350 K | - | - | - | - | - | - | - | - | - |
switzerlandwest | - | - | - | - | - | - | - | - | - | - | 250 K | - | 250 K | 250 K | 250 K | 250 K |
uksouth | - | - | 80 K | - | 240 K | - | 350 K | - | - | - | - | - | - | - | - | - |
westeurope | - | - | - | - | 240 K | - | 240 K | - | - | - | - | - | - | - | - | - |
westus | - | - | 80 K | 30 K | 300 K | - | 350 K | - | - | - | - | - | - | - | - | - |
westus3 | - | - | - | - | - | - | 350 K | - | - | - | - | - | - | - | - | - |
1 K = 1000 Tokens-Per-Minute (TPM). The relationship between TPM and Requests Per Minute (RPM) is currently defined as 6 RPM per 1000 TPM.
GPT-4 and GPT-4 Turbo Preview models
GPT-4, GPT-4-32k, and GPT-4 Turbo with Vision are now available to all Azure OpenAI Service customers. Availability varies by region. If you don't see GPT-4 in your region, please check back later.
These models can only be used with the Chat Completion API.
GPT-4 version 0314 is the first version of the model released. Version 0613 is the second version of the model and adds function calling support.
See model versions to learn about how Azure OpenAI Service handles model version upgrades, and working with models to learn how to view and configure the model version settings of your GPT-4 deployments.
Note
Version 0314
of gpt-4
and gpt-4-32k
will be retired no earlier than July 5, 2024. Version 0613
of gpt-4
and gpt-4-32k
will be retired no earlier than September 30, 2024. See model updates for model upgrade behavior.
GPT-4 version 0125-preview is an updated version of the GPT-4 Turbo preview previously released as version 1106-preview. GPT-4 version 0125-preview completes tasks such as code generation more completely compared to gpt-4-1106-preview. Because of this, depending on the task, customers may find that GPT-4-0125-preview generates more output compared to the gpt-4-1106-preview. We recommend customers compare the outputs of the new model. GPT-4-0125-preview also addresses bugs in gpt-4-1106-preview with UTF-8 handling for non-English languages.
Important
gpt-4
versions 1106-Preview and 0125-Preview will be upgraded with a stable version ofgpt-4
in the future. The deployment upgrade ofgpt-4
1106-Preview togpt-4
0125-Preview scheduled for March 8, 2024 is no longer taking place. Deployments ofgpt-4
versions 1106-Preview and 0125-Preview set to "Auto-update to default" and "Upgrade when expired" will start to be upgraded after the stable version is released. For each deployment, a model version upgrade takes place with no interruption in service for API calls. Upgrades are staged by region and the full upgrade process is expected to take 2 weeks. Deployments ofgpt-4
versions 1106-Preview and 0125-Preview set to "No autoupgrade" will not be upgraded and will stop operating when the preview version is upgraded in the region.
Model ID | Max Request (tokens) | Training Data (up to) |
---|---|---|
gpt-4 (0314) |
8,192 | Sep 2021 |
gpt-4-32k (0314) |
32,768 | Sep 2021 |
gpt-4 (0613) |
8,192 | Sep 2021 |
gpt-4-32k (0613) |
32,768 | Sep 2021 |
gpt-4 (1106-Preview)1GPT-4 Turbo Preview |
Input: 128,000 Output: 4,096 |
Apr 2023 |
gpt-4 (0125-Preview)1GPT-4 Turbo Preview |
Input: 128,000 Output: 4,096 |
Dec 2023 |
gpt-4 (vision-preview)2GPT-4 Turbo with Vision Preview |
Input: 128,000 Output: 4,096 |
Apr 2023 |
1 GPT-4 Turbo Preview = gpt-4
(0125-Preview) or gpt-4
(1106-Preview). To deploy this model, under Deployments select model gpt-4. Under version select (0125-Preview) or (1106-Preview).
2 GPT-4 Turbo with Vision Preview = gpt-4
(vision-preview). To deploy this model, under Deployments select model gpt-4. For Model version select vision-preview.
Caution
We don't recommend using preview models in production. We will upgrade all deployments of preview models to future preview versions and a stable version. Models designated preview do not follow the standard Azure OpenAI model lifecycle.
Note
Regions where GPT-4 (0314) & (0613) are listed as available have access to both the 8K and 32K versions of the model
GPT-4 and GPT-4 Turbo Preview model availability
Public cloud regions
Region |
gpt-4 , 0613 |
gpt-4 , 1106-Preview |
gpt-4 , 0125-Preview |
gpt-4 , vision-preview |
gpt-4-32k , 0613 |
---|---|---|---|---|---|
australiaeast | ✅ | ✅ | - | ✅ | ✅ |
canadaeast | ✅ | ✅ | - | - | ✅ |
eastus | - | - | ✅ | - | - |
eastus2 | - | ✅ | - | - | - |
francecentral | ✅ | ✅ | - | - | ✅ |
japaneast | - | - | - | ✅ | - |
northcentralus | - | - | ✅ | - | - |
norwayeast | - | ✅ | - | - | - |
southcentralus | - | - | ✅ | - | - |
southindia | - | ✅ | - | - | - |
swedencentral | ✅ | ✅ | - | ✅ | ✅ |
switzerlandnorth | ✅ | - | - | ✅ | ✅ |
uksouth | - | ✅ | - | - | - |
westus | - | ✅ | - | ✅ | - |
Select customer access
In addition to the regions above which are available to all Azure OpenAI customers, some select pre-existing customers have been granted access to versions of GPT-4 in additional regions:
Model | Region |
---|---|
gpt-4 (0314) |
East US France Central South Central US UK South |
gpt-4 (0613) |
East US East US 2 Japan East UK South |
Azure Government regions
The following GPT-4 models are available with Azure Government:
Model ID | Model Availability |
---|---|
gpt-4 (1106-Preview) |
US Gov Virginia US Gov Arizona |
GPT-3.5 models
Important
The NEW gpt-35-turbo (0125)
model has various improvements, including higher accuracy at responding in requested formats and a fix for a bug which caused a text encoding issue for non-English language function calls.
GPT-3.5 Turbo is used with the Chat Completion API. GPT-3.5 Turbo version 0301 can also be used with the Completions API. GPT-3.5 Turbo versions 0613 and 1106 only support the Chat Completions API.
GPT-3.5 Turbo version 0301 is the first version of the model released. Version 0613 is the second version of the model and adds function calling support.
See model versions to learn about how Azure OpenAI Service handles model version upgrades, and working with models to learn how to view and configure the model version settings of your GPT-3.5 Turbo deployments.
Note
Version 0613
of gpt-35-turbo
and gpt-35-turbo-16k
will be retired no earlier than July 13, 2024. Version 0301
of gpt-35-turbo
will be retired no earlier than June 13, 2024. See model updates for model upgrade behavior.
Model ID | Max Request (tokens) | Training Data (up to) |
---|---|---|
gpt-35-turbo 1 (0301) |
4,096 | Sep 2021 |
gpt-35-turbo (0613) |
4,096 | Sep 2021 |
gpt-35-turbo-16k (0613) |
16,384 | Sep 2021 |
gpt-35-turbo-instruct (0914) |
4,097 | Sep 2021 |
gpt-35-turbo (1106) |
Input: 16,385 Output: 4,096 |
Sep 2021 |
gpt-35-turbo (0125) NEW |
16,385 | Sep 2021 |
GPT-3.5-Turbo model availability
Public cloud regions
Region |
gpt-35-turbo , 0301 |
gpt-35-turbo , 0613 |
gpt-35-turbo , 1106 |
gpt-35-turbo , 0125 |
gpt-35-turbo-16k , 0613 |
gpt-35-turbo-instruct , 0914 |
---|---|---|---|---|---|---|
australiaeast | - | ✅ | ✅ | - | ✅ | - |
canadaeast | - | ✅ | ✅ | ✅ | ✅ | - |
eastus | ✅ | ✅ | - | - | ✅ | ✅ |
eastus2 | - | ✅ | - | - | ✅ | - |
francecentral | ✅ | ✅ | ✅ | - | ✅ | - |
japaneast | - | ✅ | - | - | ✅ | - |
northcentralus | - | ✅ | - | ✅ | ✅ | - |
southcentralus | ✅ | - | - | ✅ | - | - |
southindia | - | - | ✅ | - | - | - |
swedencentral | - | ✅ | ✅ | - | ✅ | ✅ |
switzerlandnorth | - | ✅ | - | - | ✅ | - |
uksouth | ✅ | ✅ | ✅ | - | ✅ | - |
westeurope | ✅ | - | - | - | - | - |
westus | - | - | ✅ | - | - | - |
1 This model will accept requests > 4,096 tokens. It is not recommended to exceed the 4,096 input token limit as the newer version of the model are capped at 4,096 tokens. If you encounter issues when exceeding 4,096 input tokens with this model this configuration is not officially supported.
Azure Government regions
The following GPT-3.5 turbo models are available with Azure Government:
Model ID | Model Availability |
---|---|
gpt-35-turbo (1106-Preview) |
US Gov Virginia |
Embeddings models
These models can only be used with Embedding API requests.
Note
text-embedding-3-large
is the latest and most capable embedding model. Upgrading between embedding models is not possible. In order to migrate from using text-embedding-ada-002
to text-embedding-3-large
you would need to generate new embeddings.
Model ID | Max Request (tokens) | Output Dimensions | Training Data (up-to) |
---|---|---|---|
text-embedding-ada-002 (version 2) |
8,191 | 1,536 | Sep 2021 |
text-embedding-ada-002 (version 1) |
2,046 | 1,536 | Sep 2021 |
text-embedding-3-large |
8,191 | 3,072 | Sep 2021 |
text-embedding-3-small |
8,191 | 1,536 | Sep 2021 |
Note
When sending an array of inputs for embedding, the max number of input items in the array per call to the embedding endpoint is 2048.
Public cloud regions
Region |
text-embedding-ada-002 , 1 |
text-embedding-ada-002 , 2 |
text-embedding-3-small , 1 |
text-embedding-3-large , 1 |
---|---|---|---|---|
australiaeast | - | ✅ | - | - |
brazilsouth | - | ✅ | - | - |
canadaeast | - | ✅ | ✅ | ✅ |
eastus | ✅ | ✅ | ✅ | ✅ |
eastus2 | - | ✅ | ✅ | ✅ |
francecentral | - | ✅ | - | - |
japaneast | - | ✅ | - | - |
northcentralus | - | ✅ | - | - |
norwayeast | - | ✅ | - | - |
southafricanorth | - | ✅ | - | - |
southcentralus | ✅ | ✅ | - | - |
southindia | - | ✅ | - | - |
swedencentral | - | ✅ | - | - |
switzerlandnorth | - | ✅ | - | - |
uksouth | - | ✅ | - | - |
westeurope | - | ✅ | - | - |
westus | - | ✅ | - | - |
westus3 | - | ✅ | - | - |
Azure Government regions
The following Embeddings models are available with Azure Government:
Model ID | Model Availability |
---|---|
text-embedding-ada-002 (version 2) |
US Gov Virginia US Gov Arizona |
DALL-E models
Model ID | Feature Availability | Max Request (characters) |
---|---|---|
dalle2 (preview) | East US | 1,000 |
dall-e-3 | East US, Australia East, Sweden Central | 4,000 |
Fine-tuning models
babbage-002
and davinci-002
are not trained to follow instructions. Querying these base models should only be done as a point of reference to a fine-tuned version to evaluate the progress of your training.
gpt-35-turbo
- fine-tuning of this model is limited to a subset of regions, and is not available in every region the base model is available.
Model ID | Fine-Tuning Regions | Max Request (tokens) | Training Data (up to) |
---|---|---|---|
babbage-002 |
North Central US Sweden Central Switzerland West |
16,384 | Sep 2021 |
davinci-002 |
North Central US Sweden Central Switzerland West |
16,384 | Sep 2021 |
gpt-35-turbo (0613) |
East US2 North Central US Sweden Central Switzerland West |
4,096 | Sep 2021 |
gpt-35-turbo (1106) |
East US2 North Central US Sweden Central Switzerland West |
Input: 16,385 Output: 4,096 |
Sep 2021 |
gpt-35-turbo (0125) |
East US2 North Central US Sweden Central Switzerland West |
16,385 | Sep 2021 |
Whisper models
Model ID | Model Availability | Max Request (audio file size) |
---|---|---|
whisper |
East US 2 North Central US Norway East South India Sweden Central West Europe |
25 MB |
Text to speech models (Preview)
Model ID | Model Availability |
---|---|
tts-1 |
North Central US Sweden Central |
tts-1-hd |
North Central US Sweden Central |
Assistants (Preview)
For Assistants you need a combination of a supported model, and a supported region. Certain tools and capabilities require the latest models. The following models are available in the Assistants API, SDK, Azure AI Studio and Azure OpenAI Studio. The following table is for pay-as-you-go. For information on Provisioned Throughput Unit (PTU) availability, see provisioned throughput.
Region | gpt-35-turbo (0613) |
gpt-35-turbo (1106) |
gpt-4 (0613) |
gpt-4 (1106) |
gpt-4 (0125) |
---|---|---|---|---|---|
Australia East | ✅ | ✅ | ✅ | ✅ | |
East US | ✅ | ✅ | |||
East US 2 | ✅ | ✅ | ✅ | ||
France Central | ✅ | ✅ | ✅ | ✅ | |
Norway East | ✅ | ||||
Sweden Central | ✅ | ✅ | ✅ | ✅ | |
UK South | ✅ | ✅ | ✅ | ✅ |
Next steps
Feedback
https://aka.ms/ContentUserFeedback.
Coming soon: Throughout 2024 we will be phasing out GitHub Issues as the feedback mechanism for content and replacing it with a new feedback system. For more information see:Submit and view feedback for