Edit

Share via


How to create a custom photo avatar

Custom photo avatar enables users to create a talking head avatar with only a photo. With custom photo avatar, users can efficiently create a personalized and more engaging voice live agent.

Custom photo avatar creation is a manual process. You can follow the below process, and after custom photo avatars are set up, you can access them in the Microsoft Foundry or through API.

Important

Photo avatar (preview) and custom photo avatar (preview) are licensed to you as part of your Azure subscription and are subject to terms applicable to "Previews" in the Microsoft Product Terms and the Microsoft Products and Services Data Protection Addendum("DPA"), as well as the Microsoft Generative AI Services Previews terms in the supplemental Terms of Use for Microsoft Azure Previews. Access to custom photo avatar (preview), which is part of custom text to speech avatar, is limited based on eligibility and usage criteria. Learn more here.

Step 1: Request access

Custom photo avatar is available only to Microsoft managed customers and partners. You can request access on the intake form. After the request is approved, please contact your Microsoft account manager to proceed.

Step 2: Prepare training data

The custom photo avatar creation supports real human photos and virtual human images. Here are some tips for preparing the images.

  • The photo avatar only includes the head, so it’s best to provide an image showing the character from the shoulders up.
  • The face must look like a real or virtual human. Cartoon-like characteristics, such as eyes that are larger than normal human proportions, are not supported.
  • Avoid showing elaborate accessories or jewelry.
  • The head should be fully visible and facing forward.
  • Make sure the face is fully visible, without shadows or any hidden parts.

If you are creating a custom photo avatar from a real person’s photo, you must obtain consent from that person. Provide a video of the person reading a consent statement acknowledging the use of their image. Microsoft verifies that the recorded statement matches the predefined script and compares the face in the video with the photo to confirm they belong to the same person. For an example of the consent statement see the verbal-statement-all-locales.txt file in the Azure-Samples/cognitive-services-speech-sdk GitHub repository.

Step 4: Create and deploy custom photo avatar

This step is handled in a manual process. Microsoft will set up the custom photo avatar in the Azure resources you provide offline.

Prepare resources:

Step 5: Use custom photo avatar

You can use the custom photo avatar in a voice agent or create video content in Microsoft Foundry or via API.

Use in Microsoft Foundry

To use custom photo avatar in Voice live to create personalized voice agent:

  1. Sign in to Microsoft Foundry. Make sure the New Foundry toggle is on.
  2. Select an existing project or create a new project in the resource where your custom photo avatars are deployed.
  3. Find the Voice live model playground
    1. Select Discover in the upper-right navigation.
    2. Select Models.
    3. Search "speech"
    4. Click Azure-Speech-Voice-Live in the search result
    5. Select Open in Playground

To use custom photo avatar in Text to speech avatar to create talking head video:

  1. Sign in to Microsoft Foundry. Make sure the New Foundry toggle is on.
  2. Select an existing project or create a new project in the resource where your custom photo avatars are deployed.
  3. Find the Text to speech avatar model playground
    1. Select Discover in the upper-right navigation.
    2. Select Models.
    3. Search "speech"
    4. Click Azure-Speech-Text-to-speech-Avatar in the search result
    5. Select Open in Playground

Use through API

Sample code for text to speech avatar is available on GitHub. Search "photo" to quickly go to photo avatar part in sample code.