Note
Access to this page requires authorization. You can try signing in or changing directories.
Access to this page requires authorization. You can try changing directories.
Note
This feature is currently in public preview. This preview is provided without a service-level agreement, and is not recommended for production workloads. Certain features might not be supported or might have constrained capabilities. For more information, see Supplemental Terms of Use for Microsoft Azure Previews.
When a voice agent calls external tools or takes time to generate a response, users experience silence. Interim responses bridge these wait times with short spoken messages—keeping the conversation flowing naturally while work happens in the background.
Voice Live provides the interim_response session configuration to generate these bridging messages automatically. The feature supports both agent mode (Foundry Agent Service) and model mode.
Note
In model mode, interim responses are only supported with text LLMs in cascaded mode together with azure-speech voice output. Realtime audio models don't support interim responses.
Voice Live offers two interim response modes:
- LLM-generated interim response (
llm_interim_response): Uses a lightweight LLM to generate context-aware filler text dynamically. Best for adaptive, natural-sounding responses. - Static interim response (
static_interim_response): Randomly selects from a predefined list of texts you provide. Best for deterministic or branded messaging.
Both modes can be triggered by:
| Trigger | Description |
|---|---|
latency |
Fires when response latency exceeds a configurable threshold (default: 2000 ms). |
tool |
Fires when a tool call is being executed. |
Triggers use OR logic—any matching trigger activates an interim response.
Prerequisites
Before you start, complete the following:
- Complete the Quickstart: Create a Voice Live real-time voice agent or the Quickstart: Get started with Voice Live.
- A working Voice Live setup.
- A working event loop handling Voice Live events.
Important
Interim responses require azure-ai-voicelive >= 1.0.0b5 and API version 2026-01-01-preview. Install the preview SDK with:
pip install azure-ai-voicelive --pre
This SDK is currently in preview. Features and APIs might change before general availability.
Important
Interim responses require Azure.AI.VoiceLive >= 1.1.0-beta.3 and API version 2026-01-01-preview. Install the preview SDK with:
dotnet add package Azure.AI.VoiceLive --prerelease
This SDK is currently in preview. Features and APIs might change before general availability.
Important
Interim responses require azure-ai-voicelive >= 1.0.0-beta.5 and API version 2026-01-01-preview. Add the dependency with:
<dependency>
<groupId>com.azure</groupId>
<artifactId>azure-ai-voicelive</artifactId>
<version>1.0.0-beta.5</version>
</dependency>
This SDK is currently in preview. Features and APIs might change before general availability.
Important
Interim responses require @azure/ai-voicelive >= 1.0.0-beta.3 and API version 2026-01-01-preview. Install the preview SDK with:
npm install @azure/ai-voicelive@1.0.0-beta.3
This SDK is currently in preview. Features and APIs might change before general availability.
Configure LLM-generated interim responses
LLM-generated interim responses use a lightweight model (default: gpt-4.1-mini) to create context-aware bridging text. You can customize the instructions and token limits.
Configuration parameters
| Parameter | Type | Description |
|---|---|---|
type |
string | Must be llm_interim_response (or equivalent SDK enum). |
triggers |
array | List of triggers: latency, tool, or both. Default: ["latency"]. |
latency_threshold_ms |
integer | Milliseconds before the latency trigger fires. Default: 2000. Minimum: 0. |
model |
string | Model for generating interim text. Default: gpt-4.1-mini. |
instructions |
string | Custom prompt for the interim response LLM. |
max_completion_tokens |
integer | Maximum tokens for the generated response. Default: 50. Minimum: 1. |
SDK configuration
from azure.ai.voicelive.models import (
LlmInterimResponseConfig,
InterimResponseTrigger,
RequestSession,
)
interim_response_config = LlmInterimResponseConfig(
triggers=[InterimResponseTrigger.TOOL, InterimResponseTrigger.LATENCY],
latency_threshold_ms=200,
instructions="Create friendly interim responses indicating wait time "
"due to ongoing processing, if any. Do not include in "
"all responses!"
)
session_config = RequestSession(
interim_response=interim_response_config,
# ... other session options
)
await connection.session.update(session=session_config)
SDK configuration
var interimConfig = new LlmInterimResponseConfig
{
Instructions = "Create friendly interim responses indicating "
+ "wait time due to ongoing processing, if any. "
+ "Do not include in all responses!",
};
interimConfig.Triggers.Add(InterimResponseTrigger.Tool);
interimConfig.Triggers.Add(InterimResponseTrigger.Latency);
interimConfig.LatencyThresholdMs = 200;
var options = new VoiceLiveSessionOptions
{
InterimResponse = BinaryData.FromObjectAsJson(interimConfig),
// ... other session options
};
await session.ConfigureSessionAsync(options, cancellationToken);
SDK configuration
LlmInterimResponseConfig interimResponseConfig = new LlmInterimResponseConfig()
.setTriggers(Arrays.asList(
InterimResponseTrigger.TOOL,
InterimResponseTrigger.LATENCY))
.setLatencyThresholdMs(200)
.setInstructions("Create friendly interim responses indicating "
+ "wait time due to ongoing processing, if any. "
+ "Do not include in all responses!");
VoiceLiveSessionOptions sessionOptions = new VoiceLiveSessionOptions()
.setInterimResponse(BinaryData.fromObject(interimResponseConfig));
// ... other session options
session.sendEvent(new ClientEventSessionUpdate(sessionOptions)).block();
SDK configuration
await session.updateSession({
interimResponse: {
type: "llm_interim_response",
triggers: ["tool", "latency"],
latencyThresholdInMs: 200,
instructions:
"Create friendly interim responses indicating wait time " +
"due to ongoing processing, if any. " +
"Do not include in all responses!",
},
// ... other session options
});
Configure static interim responses
Static interim responses select randomly from a predefined list of texts whenever a trigger fires. This approach gives you full control over what the agent says during wait times.
Configuration parameters
| Parameter | Type | Description |
|---|---|---|
type |
string | Must be static_interim_response (or equivalent SDK enum). |
triggers |
array | List of triggers: latency, tool, or both. Default: ["latency"]. |
latency_threshold_ms |
integer | Milliseconds before the latency trigger fires. Default: 2000. Minimum: 0. |
texts |
array | List of interim response text options to randomly select from. |
Raw JSON configuration
Static interim responses can be sent as a raw session.update command:
import json
static_config = {
"type": "session.update",
"session": {
"interim_response": {
"type": "static_interim_response",
"triggers": ["tool", "latency"],
"latency_threshold_ms": 1500,
"texts": [
"Let me look that up for you.",
"One moment while I check on that.",
"Just a second, I'm working on it."
]
}
}
}
await connection.send(json.dumps(static_config))
SDK configuration
var staticConfig = new StaticInterimResponseConfig();
staticConfig.Texts.Add("Let me look that up for you.");
staticConfig.Texts.Add("One moment while I check on that.");
staticConfig.Texts.Add("Just a second, I'm working on it.");
staticConfig.Triggers.Add(InterimResponseTrigger.Tool);
staticConfig.Triggers.Add(InterimResponseTrigger.Latency);
staticConfig.LatencyThresholdMs = 1500;
var options = new VoiceLiveSessionOptions
{
InterimResponse = BinaryData.FromObjectAsJson(staticConfig),
// ... other session options
};
await session.ConfigureSessionAsync(options, cancellationToken);
Raw JSON configuration
String staticConfig = """
{
"type": "session.update",
"session": {
"interim_response": {
"type": "static_interim_response",
"triggers": ["tool", "latency"],
"latency_threshold_ms": 1500,
"texts": [
"Let me look that up for you.",
"One moment while I check on that.",
"Just a second, I'm working on it."
]
}
}
}
""";
session.sendEvent(BinaryData.fromString(staticConfig)).block();
Raw JSON configuration
await session.updateSession({
interimResponse: {
type: "static_interim_response",
triggers: ["tool", "latency"],
latencyThresholdInMs: 1500,
texts: [
"Let me look that up for you.",
"One moment while I check on that.",
"Just a second, I'm working on it.",
],
},
});
Choose the right approach
| Requirement | LLM-generated | Static |
|---|---|---|
| Context-aware, adaptive responses | ✔ | ✖ |
| Deterministic, predictable text | ✖ | ✔ |
| Brand-controlled language | Depending on instructions | ✔ |
| Conversational variety | ✔ | Limited to configured texts |
| No extra model inference cost | ✖ | ✔ |
| Minimal configuration | ✔ | Requires text list |
Next steps
- Learn more about How to use the Voice Live API
- See the Voice Live API reference
- Explore How to add proactive messages
- Explore How to handle voice interruptions