An Azure service that turns documents into usable data. Previously known as Azure Form Recognizer.
- There is no documented control in Azure Document Translation to adjust OCR mask padding or expand bounding boxes around recognized text. OCR behavior (including how tightly background masks follow text boxes and how residual pixels appear) is an inherent limitation of the underlying OCR and rendering pipeline and is not configurable from the Document Translation API.
- There is no documented region-level exclusion mechanism in Azure Document Translation to skip logos, figures, or specific layout regions. Document Translation operates over the supported document formats as a whole and, for PDFs and images, uses OCR to extract text; it does not expose controls to define protected or non-translated regions at layout level.
- Glossaries in Document Translation control term-level translation behavior (for example, enforcing specific translations, or enforcing “no translation” by using the same source and target text), but they do not preserve original typography or fonts. As documented, glossaries are for “context-specific terminology,” “no translation,” and “specify translations for ambiguous words,” not for font or layout preservation. Font style retention is handled by the document processing pipeline, which may substitute fonts where needed; glossary replacement does not provide a way to keep original fonts or logo typography.
References: