An Azure service that integrates speech processing into apps and services.
Hello DARSHIL SHAH7,
Thank you for reaching out to Microsoft Q&A,
What you’re trying to do makes perfect sense from a cost perspective—treat the avatar deployment like a resource you can spin up when needed and shut down when idle. The challenge is that Custom Avatar in Azure Speech doesn’t behave like other Azure resources yet.
Right now, after you train and deploy your avatar in Azure Speech Studio, that deployment lifecycle (create / delete / start / stop) is only controllable through the portal UI. There isn’t a public REST API, SDK method, CLI command, or ARM/Bicep support available to manage it programmatically.
This is where the confusion usually comes in. If you look at documentation under Foundry or OpenAI-style APIs, you’ll see endpoints like:
-
/openai/deployments/.../audio/speech
Those are only inference APIs. In simple terms, they let you use the avatar once it’s already deployed. They don’t let you deploy it, scale it, or delete it. So even though it looks like there’s API coverage, it stops at runtime usage not lifecycle management.
Internally, Custom Avatar is still not exposed as a fully “resource-managed” service like Custom Voice or Azure AI Search. That’s why you see a mismatch: other Speech features have APIs for automation, but Avatar still relies on the portal for provisioning actions.
So if you try to find a way to:
- deploy avatar via API → not available
- delete deployment via API → not available
- automate via ARM/Terraform → not available
you’ll keep hitting a dead end, because those capabilities simply haven’t been released publicly yet.
If automation is absolutely required, there are only two realistic directions today.
One option is to go through Microsoft and ask to be onboarded into the limited-access / managed-customer program. There are internal or private APIs for avatar lifecycle, but they’re not generally available. Customers with specific needs (like cost optimization or large-scale deployments) can sometimes get early access through a support request or account team.
The other option some teams experiment with is UI automation basically scripting the portal using tools like Playwright or Selenium. It works in a basic sense, but it’s fragile and not something you’d want to rely on in production. Any UI change can break your flow, and it’s not an officially supported approach.
Because of these limitations, most teams take a slightly different approach: instead of trying to bring the deployment up and down, they optimize how and when the avatar is used. That means keeping the deployment active but tightly controlling when inference calls are made avoiding long-running sessions, triggering avatar generation only when needed, and shutting down usage at the application level rather than the infrastructure level. It’s not as ideal as true start/stop control, but it’s the only stable option right now.
Please refer this
• Troubleshoot & Guidance for Accessing Custom Neural Voice & Custom Avatar https://dori-uw-1.kuma-moon.com/azure/ai-services/cognitive-services-limited-access#what-is-limited-access
• Create & Deploy Custom Video Avatar (Foundry) https://dori-uw-1.kuma-moon.com/azure/ai-services/speech-service/text-to-speech-avatar/custom-avatar-create?pivots=ai-foundry-portal
• Create & Deploy Custom Avatar (Speech Studio) https://dori-uw-1.kuma-moon.com/azure/ai-services/speech-service/text-to-speech-avatar/custom-avatar-create?wt.mc_id=knowledgesearch_inproduct_azure-cxp-community-insider#step-5-deploy-and-use-your-avatar-model
• Azure OpenAI in Microsoft Foundry Models REST API https://dori-uw-1.kuma-moon.com/azure/foundry/openai/reference-preview#speech---create
I Hope this helps. Do let me know if you have any further queries.
Thank you!