Summary
In this module, you learned about vision-enabled generative AI models and how to implement chat solutions that include image-based input.
Vision-enabled models let you create AI solutions that can understand images and respond to related questions or instructions. Beyond just identifying objects in pictures, some models can also use reasoning based on what they see. For instance, they can interpret a chart or assess if an object is damaged.
Tip
For more information about working with multimodal models in Microsoft Foundry, see How to use image and audio in chat completions with Azure AI model inference and Quickstart: Use images in your AI chats.