An Azure service that turns documents into usable data. Previously known as Azure Form Recognizer.
Document Intelligence Studio custom models do not expose training controls to change how the OCR engine interprets punctuation such as commas and periods inside numbers. Editing the generated .ocr.json or labels.json files in storage and retraining does not change the underlying OCR behavior; those files are only used to map fields and values for training, not to retrain the OCR engine itself.
For numeric and currency fields, the supported way to influence interpretation is through value types on labels (for v2.1 Sample Labeling tool) or equivalent field typing in Studio. Setting a tag or field to number or number.currency standardizes the returned format, but it does not override whether a specific glyph is read as a comma or a period when the OCR engine has already recognized it incorrectly.
If the OCR consistently misreads the punctuation for this handwriting style or scan quality, this limitation cannot be corrected by additional custom-model training in Studio. In that case, the options are:
- Use the numeric value type so that values are normalized as numbers where possible, then post-process the extracted text/values in application code (for example, replace
.with,in specific positions according to business rules). - Improve input quality (higher-resolution scans, clearer handwriting, better contrast) so the OCR engine can more reliably distinguish commas from periods.
- If using the older v2.1 stack and Sample Labeling tool, ensure tags for these fields are set to
numberwithcurrencysubtype so that numeric parsing is as robust as possible.
There is no documented mechanism in Document Intelligence Studio to explicitly train or override comma/period recognition in currency values beyond these field-typing and input-quality measures.
References: