Edit

Share via


Improve tool calling and latency wait times (preview)

Note

This feature is currently in public preview. This preview is provided without a service-level agreement, and is not recommended for production workloads. Certain features might not be supported or might have constrained capabilities. For more information, see Supplemental Terms of Use for Microsoft Azure Previews.

When a voice agent calls external tools or takes time to generate a response, users experience silence. Interim responses bridge these wait times with short spoken messages—keeping the conversation flowing naturally while work happens in the background.

Voice Live provides the interim_response session configuration to generate these bridging messages automatically. The feature supports both agent mode (Foundry Agent Service) and model mode.

Note

In model mode, interim responses are only supported with text LLMs in cascaded mode together with azure-speech voice output. Realtime audio models don't support interim responses.

Voice Live offers two interim response modes:

  • LLM-generated interim response (llm_interim_response): Uses a lightweight LLM to generate context-aware filler text dynamically. Best for adaptive, natural-sounding responses.
  • Static interim response (static_interim_response): Randomly selects from a predefined list of texts you provide. Best for deterministic or branded messaging.

Both modes can be triggered by:

Trigger Description
latency Fires when response latency exceeds a configurable threshold (default: 2000 ms).
tool Fires when a tool call is being executed.

Triggers use OR logic—any matching trigger activates an interim response.

Prerequisites

Before you start, complete the following:

Important

Interim responses require azure-ai-voicelive >= 1.0.0b5 and API version 2026-01-01-preview. Install the preview SDK with:

pip install azure-ai-voicelive --pre

This SDK is currently in preview. Features and APIs might change before general availability.

Important

Interim responses require Azure.AI.VoiceLive >= 1.1.0-beta.3 and API version 2026-01-01-preview. Install the preview SDK with:

dotnet add package Azure.AI.VoiceLive --prerelease

This SDK is currently in preview. Features and APIs might change before general availability.

Important

Interim responses require azure-ai-voicelive >= 1.0.0-beta.5 and API version 2026-01-01-preview. Add the dependency with:

<dependency>
    <groupId>com.azure</groupId>
    <artifactId>azure-ai-voicelive</artifactId>
    <version>1.0.0-beta.5</version>
</dependency>

This SDK is currently in preview. Features and APIs might change before general availability.

Important

Interim responses require @azure/ai-voicelive >= 1.0.0-beta.3 and API version 2026-01-01-preview. Install the preview SDK with:

npm install @azure/ai-voicelive@1.0.0-beta.3

This SDK is currently in preview. Features and APIs might change before general availability.

Configure LLM-generated interim responses

LLM-generated interim responses use a lightweight model (default: gpt-4.1-mini) to create context-aware bridging text. You can customize the instructions and token limits.

Configuration parameters

Parameter Type Description
type string Must be llm_interim_response (or equivalent SDK enum).
triggers array List of triggers: latency, tool, or both. Default: ["latency"].
latency_threshold_ms integer Milliseconds before the latency trigger fires. Default: 2000. Minimum: 0.
model string Model for generating interim text. Default: gpt-4.1-mini.
instructions string Custom prompt for the interim response LLM.
max_completion_tokens integer Maximum tokens for the generated response. Default: 50. Minimum: 1.

SDK configuration

from azure.ai.voicelive.models import (
    LlmInterimResponseConfig,
    InterimResponseTrigger,
    RequestSession,
)

interim_response_config = LlmInterimResponseConfig(
    triggers=[InterimResponseTrigger.TOOL, InterimResponseTrigger.LATENCY],
    latency_threshold_ms=200,
    instructions="Create friendly interim responses indicating wait time "
                 "due to ongoing processing, if any. Do not include in "
                 "all responses!"
)

session_config = RequestSession(
    interim_response=interim_response_config,
    # ... other session options
)

await connection.session.update(session=session_config)

SDK configuration

var interimConfig = new LlmInterimResponseConfig
{
    Instructions = "Create friendly interim responses indicating "
        + "wait time due to ongoing processing, if any. "
        + "Do not include in all responses!",
};
interimConfig.Triggers.Add(InterimResponseTrigger.Tool);
interimConfig.Triggers.Add(InterimResponseTrigger.Latency);
interimConfig.LatencyThresholdMs = 200;

var options = new VoiceLiveSessionOptions
{
    InterimResponse = BinaryData.FromObjectAsJson(interimConfig),
    // ... other session options
};

await session.ConfigureSessionAsync(options, cancellationToken);

SDK configuration

LlmInterimResponseConfig interimResponseConfig = new LlmInterimResponseConfig()
        .setTriggers(Arrays.asList(
                InterimResponseTrigger.TOOL,
                InterimResponseTrigger.LATENCY))
        .setLatencyThresholdMs(200)
        .setInstructions("Create friendly interim responses indicating "
                + "wait time due to ongoing processing, if any. "
                + "Do not include in all responses!");

VoiceLiveSessionOptions sessionOptions = new VoiceLiveSessionOptions()
        .setInterimResponse(BinaryData.fromObject(interimResponseConfig));
        // ... other session options

session.sendEvent(new ClientEventSessionUpdate(sessionOptions)).block();

SDK configuration

await session.updateSession({
    interimResponse: {
        type: "llm_interim_response",
        triggers: ["tool", "latency"],
        latencyThresholdInMs: 200,
        instructions:
            "Create friendly interim responses indicating wait time " +
            "due to ongoing processing, if any. " +
            "Do not include in all responses!",
    },
    // ... other session options
});

Configure static interim responses

Static interim responses select randomly from a predefined list of texts whenever a trigger fires. This approach gives you full control over what the agent says during wait times.

Configuration parameters

Parameter Type Description
type string Must be static_interim_response (or equivalent SDK enum).
triggers array List of triggers: latency, tool, or both. Default: ["latency"].
latency_threshold_ms integer Milliseconds before the latency trigger fires. Default: 2000. Minimum: 0.
texts array List of interim response text options to randomly select from.

Raw JSON configuration

Static interim responses can be sent as a raw session.update command:

import json

static_config = {
    "type": "session.update",
    "session": {
        "interim_response": {
            "type": "static_interim_response",
            "triggers": ["tool", "latency"],
            "latency_threshold_ms": 1500,
            "texts": [
                "Let me look that up for you.",
                "One moment while I check on that.",
                "Just a second, I'm working on it."
            ]
        }
    }
}
await connection.send(json.dumps(static_config))

SDK configuration

var staticConfig = new StaticInterimResponseConfig();
staticConfig.Texts.Add("Let me look that up for you.");
staticConfig.Texts.Add("One moment while I check on that.");
staticConfig.Texts.Add("Just a second, I'm working on it.");
staticConfig.Triggers.Add(InterimResponseTrigger.Tool);
staticConfig.Triggers.Add(InterimResponseTrigger.Latency);
staticConfig.LatencyThresholdMs = 1500;

var options = new VoiceLiveSessionOptions
{
    InterimResponse = BinaryData.FromObjectAsJson(staticConfig),
    // ... other session options
};

await session.ConfigureSessionAsync(options, cancellationToken);

Raw JSON configuration

String staticConfig = """
    {
        "type": "session.update",
        "session": {
            "interim_response": {
                "type": "static_interim_response",
                "triggers": ["tool", "latency"],
                "latency_threshold_ms": 1500,
                "texts": [
                    "Let me look that up for you.",
                    "One moment while I check on that.",
                    "Just a second, I'm working on it."
                ]
            }
        }
    }
    """;
session.sendEvent(BinaryData.fromString(staticConfig)).block();

Raw JSON configuration

await session.updateSession({
    interimResponse: {
        type: "static_interim_response",
        triggers: ["tool", "latency"],
        latencyThresholdInMs: 1500,
        texts: [
            "Let me look that up for you.",
            "One moment while I check on that.",
            "Just a second, I'm working on it.",
        ],
    },
});

Choose the right approach

Requirement LLM-generated Static
Context-aware, adaptive responses
Deterministic, predictable text
Brand-controlled language Depending on instructions
Conversational variety Limited to configured texts
No extra model inference cost
Minimal configuration Requires text list

Next steps