Azure document translation limitation on the docs context in garment industry with sketch loge drawing image

Question

Azure document translation limitation on the docs context in garment industry with sketch loge drawing image

YM Set Sophy 0

We are using Azure Document Translation for Chinese→Khmer garment tech-pack PDFs. For digital PDFs, Khmer font sizes are inconsistent due to font substitution, and short text runs are sometimes detected as English unless punctuation is added. For scanned PDFs, OCR-based rendering leaves residual source glyph pixels (“black dots”) because white background masks are tightly bounded to OCR boxes. Logos and company names lose typography even when glossary entries map source=text → target=text. We understand these are OCR/rendering limitations. Please confirm:

There is no current control for mask padding / bounding-box expansion.
There is no region-level exclusion for logos/figures in Document Translation..
Glossary replacement always re-renders text and cannot preserve original fonts.

0 comments

1 answer

Your answer

Answer 1

There is no documented control in Azure Document Translation to adjust OCR mask padding or expand bounding boxes around recognized text. OCR behavior (including how tightly background masks follow text boxes and how residual pixels appear) is an inherent limitation of the underlying OCR and rendering pipeline and is not configurable from the Document Translation API.
There is no documented region-level exclusion mechanism in Azure Document Translation to skip logos, figures, or specific layout regions. Document Translation operates over the supported document formats as a whole and, for PDFs and images, uses OCR to extract text; it does not expose controls to define protected or non-translated regions at layout level.
Glossaries in Document Translation control term-level translation behavior (for example, enforcing specific translations, or enforcing “no translation” by using the same source and target text), but they do not preserve original typography or fonts. As documented, glossaries are for “context-specific terminology,” “no translation,” and “specify translations for ambiguous words,” not for font or layout preservation. Font style retention is handled by the document processing pipeline, which may substitute fonts where needed; glossary replacement does not provide a way to keep original fonts or logo typography.

References:

Share via

Azure document translation limitation on the docs context in garment industry with sketch loge drawing image

1 answer

Your answer