A Microsoft offering that enables tracking of cloud usage and expenditures for Azure and other cloud providers.
Hello hamed There is currently no Microsoft official document that states that response.done token usage from the Azure OpenAI Realtime API must exactly match Azure Cost Management billing meters.
Microsoft does not provide any official 1:1 mapping between:
- The
response.done.response.usageobject returned by the Realtime API and - The Azure Cost Management meters (for example:
gpt rt aud mn in,gpt rt txt mn out, cached meters, etc.).
Therefore, differences between response.done usage and Azure Cost Management are expected, and Azure Cost Management is the billing source of truth.
- Azure billing is based on Azure Cost Management meters, not SDK usage events.
Microsoft’s official Azure guidance states that Azure Cost Management is the authoritative source for consumption and billing analysis. Azure OpenAI charges must be analyzed by meter using Cost Management tools, exports, or the FOCUS schema.
Microsoft documentation explains how to analyze Azure OpenAI costs by meter, not by API response payloads. There is no claim that SDK‑returned usage objects represent billable quantities.
-
response.doneusage is a protocol‑level metric, not a billing contract:
The OpenAI Realtime API documentation (used by Azure OpenAI Realtime) states that:
- Token usage is reported per Response inside the
response.doneevent - Tokens are categorized by modality (text, audio, cached)
- The usage object is intended for usage inspection and estimation
However, the documentation never states that these values are guaranteed to equal Azure billing meters, nor does it describe Azure meter attribution.
Reference: https://docs.azure.cn/en-us/ai-services/speech-service/speech-services-quotas-and-limits -- OpenAI Realtime API – Managing costs and token usage (applies to Azure-hosted Realtime models)
- Audio transcription is explicitly billed separately from Realtime responses:
Microsoft‑supported Realtime documentation clearly states:
- Realtime responses are billed when a Response is created
- Input transcription (for example, using
whisper-1) is billed separately - Transcription uses a different model and rate card
- Transcription usage is reported in
conversation.item.input_audio_transcription.completed, not inresponse.done
This means transcription costs will not be represented inside response.done.response.usage, but will still appear in Azure Cost Management.
Official reference: OpenAI Realtime cost documentation – Input transcription costs section https://docs.azure.cn/en-us/ai-services/speech-service/speech-services-quotas-and-limits
Microsoft does not publish:
- Any mapping table from
response.doneusage fields → Azure billing meters - Any guarantee that cached token meters will always appear separately in Cost Management
- Any statement that default system instructions, internal prompts, or audio processing overhead will appear in
response.done - Any claim that Realtime SDK usage values should reconcile exactly with Cost Management exports
Because none of this is documented, it is not a bug by definition.
response.done usage values are not a billing contract and are not guaranteed to match Azure Cost Management meters. Azure Cost Management is the authoritative billing source. Microsoft has not documented a 1:1 reconciliation mechanism between Realtime API usage events and billing meters.
At this time, Microsoft has not published an official document that provides a reconciliation formula between Azure OpenAI Realtime API response.done token usage and Azure Cost Management billing meters.
For billing accuracy, Azure Cost Management should always be treated as the source of truth. The Realtime API usage object is intended for usage visibility and estimation only, not for billing reconciliation.
Thanks,
Suchitra.