Azure OpenAI Realtime API: Token usage from `response.done` event does not match Azure Cost Management meter data

Question

Azure OpenAI Realtime API: Token usage from `response.done` event does not match Azure Cost Management meter data

hamed 50

Problem

I'm using the Azure OpenAI Realtime API (gpt-realtime-mini-2025-12-15) via the .NET OpenAI.Realtime SDK to measure token consumption. The response.done server event includes a usage object with a detailed token breakdown, but when I compare these values against the Azure Cost Management meter data for the same isolated session, every single meter is significantly different.

This is a single voice turn with no custom system instruction, no tools, and no function calls — the simplest possible scenario — yet the numbers diverge substantially.

Code

The scenario is minimal — one audio file sent, one response received:

#pragma warning disable OPENAI002

using OpenAI.Realtime;

public class RT_Test02_SingleVoice
{
    public async Task RunAsync()
    {
        // 1. Connect to Azure OpenAI Realtime
        var client = new RealtimeClient(
            credential: new ApiKeyCredential("..."),
            options: new RealtimeClientOptions
            {
                Endpoint = new Uri("https://<my-resource>.services.ai.azure.com/openai/realtime")
            });

        var session = await client.StartConversationSessionAsync(model: "gpt-realtime-mini-2025-12-15");

        // 2. Configure session — PCM 24kHz, server VAD, Whisper transcription, no tools
        await session.ConfigureConversationSessionAsync(new RealtimeConversationSessionOptions
        {
            AudioOptions = new RealtimeConversationSessionAudioOptions
            {
                InputAudioOptions = new RealtimeConversationSessionInputAudioOptions
                {
                    AudioFormat = new RealtimePcmAudioFormat(),
                    AudioTranscriptionOptions = new RealtimeAudioTranscriptionOptions
                    {
                        Model = "whisper-1",
                    },
                    TurnDetection = new RealtimeServerVadTurnDetection
                    {
                        DetectionThreshold = 0.9f,
                        SilenceDuration = TimeSpan.FromMilliseconds(1000),
                        PrefixPadding = TimeSpan.FromMilliseconds(300),
                    },
                },
                OutputAudioOptions = new RealtimeConversationSessionOutputAudioOptions
                {
                    AudioFormat = new RealtimePcmAudioFormat(),
                    Voice = RealtimeVoice.Alloy,
                },
            }
        });

        // 3. Send pre-recorded audio (PCM 24kHz, ~11 seconds) in 100ms chunks
        var pcmData = await File.ReadAllBytesAsync("input.wav"); // resampled to 24kHz
        const int chunkSize = 4800; // 100ms at 24kHz 16-bit mono
        for (int i = 0; i < pcmData.Length; i += chunkSize)
        {
            int len = Math.Min(chunkSize, pcmData.Length - i);
            await session.SendInputAudioAsync(
                BinaryData.FromBytes(new ReadOnlyMemory<byte>(pcmData, i, len)));
        }

        await Task.Delay(200);
        await session.SendCommandAsync(new RealtimeClientCommandInputAudioBufferCommit());

        // 4. Listen for events and extract token usage from response.done
        await foreach (var update in session.ReceiveUpdatesAsync())
        {
            if (update is RealtimeServerUpdateResponseDone responseDone)
            {
                var usage = responseDone.Response.Usage;
                Console.WriteLine($"Total={usage.TotalTokenCount}");
                Console.WriteLine($"  Input={usage.InputTokenCount}");
                Console.WriteLine($"    Text={usage.InputTokenDetails.TextTokenCount}");
                Console.WriteLine($"    Audio={usage.InputTokenDetails.AudioTokenCount}");
                Console.WriteLine($"    CachedText={usage.InputTokenDetails.CachedTokenDetails.TextTokenCount}");
                Console.WriteLine($"    CachedAudio={usage.InputTokenDetails.CachedTokenDetails.AudioTokenCount}");
                Console.WriteLine($"  Output={usage.OutputTokenCount}");
                Console.WriteLine($"    Text={usage.OutputTokenDetails.TextTokenCount}");
                Console.WriteLine($"    Audio={usage.OutputTokenDetails.AudioTokenCount}");
                break;
            }
        }
    }
}

Console output

Total=766
  Input=229
    Text=119
    Audio=110
    CachedText=64
    CachedAudio=64
  Output=537
    Text=117
    Audio=420

Conversation

User (transcribed from audio):

Hello! My name is Hamed. What is your name and what can you do for me? For example, what is the weather in Tehran?

Assistant (audio response transcript):

Hi Ahmed! Great to meet you. I'm an AI assistant and I'm here to help with all sorts of things—whether it's the weather, information, advice, or anything else. Now, let me get the latest weather for Tehran—give me just a moment. Alright, the current weather in Tehran is around 15 degrees Celsius, partly cloudy, with a slight breeze. If you need more details or anything else, let me know!

Raw `usage` from `response.done` event

{
  "TotalTokenCount": 766,
  "InputTokenCount": 229,
  "OutputTokenCount": 537,
  "InputTokenDetails": {
    "CachedTokenCount": 128,
    "TextTokenCount": 119,
    "AudioTokenCount": 110,
    "CachedTokenDetails": {
      "TextTokenCount": 64,
      "AudioTokenCount": 64
    }
  },
  "OutputTokenDetails": {
    "TextTokenCount": 117,
    "AudioTokenCount": 420
  }
}

Azure Cost Management meter values (same isolated session)

I filtered the Azure Cost Management report to only this deployment on the exact date this test ran, with no other traffic on the deployment.

Azure Meter Name	API `response.done` Value	Azure Cost Report Value	Ratio
`gpt rt aud mn in gl 1215 1M Tokens` (audio input)	110	640	5.8×
`gpt rt aud mn out gl 1215 1M Tokens` (audio output)	420	631	1.5×
`gpt rt txt mn in gl 1215 1M Tokens` (text input)	119	357	3.0×
`gpt rt txt mn out gl 1215 1M Tokens` (text output)	117	181	1.5×
`gpt rt txt mn cd in gl 1215 1M Tokens` (cached text input)	64	0	—
`gpt rt aud mn cd in gl 1215 1M Tokens` (cached audio input)	64	0	—

Key observations

Every single meter is different — not one value matches between the API response and Azure billing.
Cached tokens (128 total): The API reports 64 cached text + 64 cached audio input tokens, but Azure reports 0 for both cached meters. Are cached tokens rolled into the non-cached meters for billing?
Audio input is ~6× higher on Azure (640 vs 110): Even if cached audio is added (110 + 64 = 174), there are still 466 unexplained audio input tokens. Could Whisper transcription (which processes the same audio) be billed under this same meter? The transcription.completed event returned "usage": {"type": "duration", "seconds": 0}, suggesting Azure bills transcription by duration rather than tokens.
Audio output is ~1.5× higher on Azure (631 vs 420): Where do the extra 211 audio output tokens come from?
Text input is ~3× higher on Azure (357 vs 119): Even with cached text added (119 + 64 = 183), 174 tokens are unaccounted for. Could the default system instructions (which the model ships with — I did not set any custom instructions) be tokenized and billed but excluded from the response.done usage?
Text output is ~1.5× higher on Azure (181 vs 117): Could Whisper's transcription output be billed under this text-output meter?

What I've verified

This was the only session on this deployment that day — no background usage.
I logged all 162 server events to JSON. There is exactly one response.done with non-zero tokens.
Only one input_audio_buffer.committed event (one audio turn).
The conversation.item.input_audio_transcription.completed event reported "usage": {"type": "duration", "seconds": 0}.
A benign VAD error (buffer too small) occurred after the response — this does not trigger billing.

Questions

How does Azure map response.done usage fields to billing meters? Is there documented mapping, especially for cached vs non-cached tokens?
Does Whisper transcription get billed under the same gpt rt aud mn in and gpt rt txt mn out meters as the main model — even when transcription.completed reports duration-based usage?
Are there hidden "internal" tokens (default system instructions, audio framing overhead, internal chain-of-thought) that Azure bills but response.done does not report?
Is response.done usage intended to reflect actual billing, or is it only an approximation?
Has anyone successfully reconciled response.done token counts with Azure Cost Management for the Realtime API?

Environment

SDK: OpenAI .NET NuGet package (OpenAI.Realtime namespace)
Runtime: .NET 10
Azure region: Sweden Central
Model: gpt-realtime-mini-2025-12-15
Date: March 31, 2026

Suchitra Suregaunkar 11,470 Reputation points Microsoft External Staff Moderator

2026-04-01T00:50:55.64+00:00

Hello hamed

We are looking into this issue and will keep you posted updates.

Thanks,
Suchitra.
hamed 50 Reputation points

2026-04-02T16:37:29.1566667+00:00

Thank you for your detailed and very clear explanation in your previous reply, it was extremely helpful.

I have one more question regarding billing granularity:

Right now, Azure Cost Management only provides daily aggregated costs for the entire resource. However, I need a way (an API or any supported method) to calculate the actual cost of a single user session in my application.

My goal is not to match token usage with billing. What I really need is the exact amount of money I am paying to Azure for each individual session where a user interacts with my AI system.

In other words, I need session‑level cost attribution, not daily resource‑level totals.

Is there any recommended or supported approach to achieve this — even if it requires a combined or hybrid method?
Suchitra Suregaunkar 11,470 Reputation points Microsoft External Staff Moderator

2026-04-02T19:54:56.29+00:00
Hello hamed

No, Azure does not currently provide any supported API or built‑in mechanism to calculate the exact Azure OpenAI cost of an individual user session.

Azure Cost Management only supports resource‑level cost attribution with time‑based aggregation (daily/hourly). Session‑level or request‑level billing attribution is not supported today.

While customers may estimate per‑session cost using application telemetry and published pricing, this approach is not exact and is not supported as a billing mechanism by Microsoft.

Microsoft does support a customer‑implemented cost estimation approach using:

Application telemetry (token / usage metrics)

Published Azure OpenAI pricing

Azure Cost Management only for validating total spend

This approach provides per‑session cost estimation, not exact billing.

Azure OpenAI pricing is token‑based, with published rates per model, modality, and deployment type

Cost Management provides resource‑level aggregated billing

SDK usage data is not a billing API

You can capture per‑session usage in your application: It supports reading usage metrics from Azure OpenAI responses.

For Realtime API, this includes:

response.done.response.usage

Audio/text token counts (input, output, cached)

Transcription usage events (if enabled)

Reference: https://developers.openai.com/api/docs/guides/realtime-costs

You can apply official Azure OpenAI pricing rates: Microsoft publishes Per‑model input token price, Cached input token price, Output token price, Separate pricing for transcription models.

You calculate cost as:

Cost per session = (Input tokens × input price) + (Cached input tokens × cached price) + (Output tokens × output price) + (Transcription duration × transcription price)

Use the below Pricing link:
https://azure.microsoft.com/en-us/pricing/details/azure-openai/

You can Store estimated cost per session in your telemetry system: This is entirely application‑side and supported.

Common, supported practices:

Store SessionId → EstimatedCost

Keep timestamps and model/version used

Aggregate sessions per user, tenant, or feature.

Azure Cost Management should be used to:

Validate total daily/monthly spend

Detect anomalies

Reconcile overall cost, not sessions

Microsoft does not support:

Mapping sessions to Cost Management records

Request‑level billing correlation.

You can use the pricing calculator to estimate costs for these models based on your predicted usage.

Thanks,

Suchitra.

Answer accepted by question author

0 additional answers

Your answer

Suchitra Suregaunkar 11,470 Reputation points Microsoft External Staff Moderator

2026-04-01T00:50:55.64+00:00

Hello hamed

We are looking into this issue and will keep you posted updates.

Thanks,
Suchitra.
hamed 50 Reputation points

2026-04-02T16:37:29.1566667+00:00

Thank you for your detailed and very clear explanation in your previous reply, it was extremely helpful.

I have one more question regarding billing granularity:

Right now, Azure Cost Management only provides daily aggregated costs for the entire resource. However, I need a way (an API or any supported method) to calculate the actual cost of a single user session in my application.

My goal is not to match token usage with billing. What I really need is the exact amount of money I am paying to Azure for each individual session where a user interacts with my AI system.

In other words, I need session‑level cost attribution, not daily resource‑level totals.

Is there any recommended or supported approach to achieve this — even if it requires a combined or hybrid method?

Answer 1

Hello hamed There is currently no Microsoft official document that states that response.done token usage from the Azure OpenAI Realtime API must exactly match Azure Cost Management billing meters.

Microsoft does not provide any official 1:1 mapping between:

The response.done.response.usage object returned by the Realtime API and
The Azure Cost Management meters (for example: gpt rt aud mn in, gpt rt txt mn out, cached meters, etc.).

Therefore, differences between response.done usage and Azure Cost Management are expected, and Azure Cost Management is the billing source of truth.

Azure billing is based on Azure Cost Management meters, not SDK usage events.

Microsoft’s official Azure guidance states that Azure Cost Management is the authoritative source for consumption and billing analysis. Azure OpenAI charges must be analyzed by meter using Cost Management tools, exports, or the FOCUS schema.

Microsoft documentation explains how to analyze Azure OpenAI costs by meter, not by API response payloads. There is no claim that SDK‑returned usage objects represent billable quantities.

response.done usage is a protocol‑level metric, not a billing contract:

The OpenAI Realtime API documentation (used by Azure OpenAI Realtime) states that:

Token usage is reported per Response inside the response.done event
Tokens are categorized by modality (text, audio, cached)
The usage object is intended for usage inspection and estimation

However, the documentation never states that these values are guaranteed to equal Azure billing meters, nor does it describe Azure meter attribution.

Reference: https://docs.azure.cn/en-us/ai-services/speech-service/speech-services-quotas-and-limits -- OpenAI Realtime API – Managing costs and token usage (applies to Azure-hosted Realtime models)

Audio transcription is explicitly billed separately from Realtime responses:

Microsoft‑supported Realtime documentation clearly states:

Realtime responses are billed when a Response is created
Input transcription (for example, using whisper-1) is billed separately
Transcription uses a different model and rate card
Transcription usage is reported in conversation.item.input_audio_transcription.completed, not in response.done

This means transcription costs will not be represented inside response.done.response.usage, but will still appear in Azure Cost Management.

Official reference: OpenAI Realtime cost documentation – Input transcription costs section https://docs.azure.cn/en-us/ai-services/speech-service/speech-services-quotas-and-limits

Microsoft does not publish:

Any mapping table from response.done usage fields → Azure billing meters
Any guarantee that cached token meters will always appear separately in Cost Management
Any statement that default system instructions, internal prompts, or audio processing overhead will appear in response.done
Any claim that Realtime SDK usage values should reconcile exactly with Cost Management exports

Because none of this is documented, it is not a bug by definition.

response.done usage values are not a billing contract and are not guaranteed to match Azure Cost Management meters. Azure Cost Management is the authoritative billing source. Microsoft has not documented a 1:1 reconciliation mechanism between Realtime API usage events and billing meters.

At this time, Microsoft has not published an official document that provides a reconciliation formula between Azure OpenAI Realtime API response.done token usage and Azure Cost Management billing meters.

For billing accuracy, Azure Cost Management should always be treated as the source of truth. The Realtime API usage object is intended for usage visibility and estimation only, not for billing reconciliation.

Thanks,

Suchitra.

Suchitra Suregaunkar 11,470 Reputation points Microsoft External Staff Moderator

2026-04-03T08:30:19.3266667+00:00

Hello hamed

I hope the details shared above helped in addressing your concern.

If the suggested resolution resolved the issue, kindly consider marking the answer as "Accepted" and "Upvote" it. This helps other community members who may encounter a similar issue in the future.

If you’re still experiencing the problem or need further clarification, please feel free to share additional information so we can continue investigating and assist you further.

Thanks,
Suchitra.

Share via

Azure OpenAI Realtime API: Token usage from `response.done` event does not match Azure Cost Management meter data

Problem

Code

Console output

Conversation

Raw usage from response.done event

Azure Cost Management meter values (same isolated session)

Key observations

What I've verified

Questions

Environment

0 additional answers

Your answer

Raw `usage` from `response.done` event