Share via

Azure document translation limitation on the docs context in garment industry with sketch loge drawing image

YM Set Sophy 0 Reputation points
2026-03-28T06:53:24.45+00:00

We are using Azure Document Translation for Chinese→Khmer garment tech-pack PDFs. For digital PDFs, Khmer font sizes are inconsistent due to font substitution, and short text runs are sometimes detected as English unless punctuation is added. For scanned PDFs, OCR-based rendering leaves residual source glyph pixels (“black dots”) because white background masks are tightly bounded to OCR boxes. Logos and company names lose typography even when glossary entries map source=text → target=text. We understand these are OCR/rendering limitations. Please confirm:

  1. There is no current control for mask padding / bounding-box expansion.
  2. There is no region-level exclusion for logos/figures in Document Translation..
  3. Glossary replacement always re-renders text and cannot preserve original fonts.
Azure Document Intelligence in Foundry Tools
0 comments No comments

1 answer

Sort by: Most helpful
  1. Q&A Assist
    2026-03-28T06:53:34.2033333+00:00
    AI generated content. This question contains an answer created with AI from Q&A Assist. Learn more
    1. There is no documented control in Azure Document Translation to adjust OCR mask padding or expand bounding boxes around recognized text. OCR behavior (including how tightly background masks follow text boxes and how residual pixels appear) is an inherent limitation of the underlying OCR and rendering pipeline and is not configurable from the Document Translation API.
    2. There is no documented region-level exclusion mechanism in Azure Document Translation to skip logos, figures, or specific layout regions. Document Translation operates over the supported document formats as a whole and, for PDFs and images, uses OCR to extract text; it does not expose controls to define protected or non-translated regions at layout level.
    3. Glossaries in Document Translation control term-level translation behavior (for example, enforcing specific translations, or enforcing “no translation” by using the same source and target text), but they do not preserve original typography or fonts. As documented, glossaries are for “context-specific terminology,” “no translation,” and “specify translations for ambiguous words,” not for font or layout preservation. Font style retention is handled by the document processing pipeline, which may substitute fonts where needed; glossary replacement does not provide a way to keep original fonts or logo typography.

    References:

    0 comments No comments

Your answer

Answers can be marked as 'Accepted' by the question author and 'Recommended' by moderators, which helps users know the answer solved the author's problem.