Share via

How to train a custom model in Azure Document Intelligence Studio to correctly read comma and period in currency values

Milan Raval 20 Reputation points
2026-03-03T12:19:52.92+00:00

How to train a custom model in Azure Document Intelligence Studio to correctly read comma and period in currency values ?

I have tried providing it many samples like this, and manually corrected values in .ocr.json and labels.json files created by the studio in Azure storage folder, and trained model. But it is still reading a comma as a period.

For example in below document it is reading 29,000 as 29.000, but it is clearly a comma..
User's image

Azure Document Intelligence in Foundry Tools
0 comments No comments

Answer accepted by question author
  1. Q&A Assist
    2026-03-03T12:20:06.78+00:00
    AI generated content. This question contains an answer created with AI from Q&A Assist. Learn more

    Document Intelligence Studio custom models do not expose training controls to change how the OCR engine interprets punctuation such as commas and periods inside numbers. Editing the generated .ocr.json or labels.json files in storage and retraining does not change the underlying OCR behavior; those files are only used to map fields and values for training, not to retrain the OCR engine itself.

    For numeric and currency fields, the supported way to influence interpretation is through value types on labels (for v2.1 Sample Labeling tool) or equivalent field typing in Studio. Setting a tag or field to number or number.currency standardizes the returned format, but it does not override whether a specific glyph is read as a comma or a period when the OCR engine has already recognized it incorrectly.

    If the OCR consistently misreads the punctuation for this handwriting style or scan quality, this limitation cannot be corrected by additional custom-model training in Studio. In that case, the options are:

    1. Use the numeric value type so that values are normalized as numbers where possible, then post-process the extracted text/values in application code (for example, replace . with , in specific positions according to business rules).
    2. Improve input quality (higher-resolution scans, clearer handwriting, better contrast) so the OCR engine can more reliably distinguish commas from periods.
    3. If using the older v2.1 stack and Sample Labeling tool, ensure tags for these fields are set to number with currency subtype so that numeric parsing is as robust as possible.

    There is no documented mechanism in Document Intelligence Studio to explicitly train or override comma/period recognition in currency values beyond these field-typing and input-quality measures.


    References:


0 additional answers

Sort by: Most helpful

Your answer

Answers can be marked as 'Accepted' by the question author and 'Recommended' by moderators, which helps users know the answer solved the author's problem.