Document Models - Analyze Batch Documents
Analyzes batch documents with document model.
POST {endpoint}/documentintelligence/documentModels/{modelId}:analyzeBatch?api-version=2024-11-30
POST {endpoint}/documentintelligence/documentModels/{modelId}:analyzeBatch?api-version=2024-11-30&pages={pages}&locale={locale}&stringIndexType={stringIndexType}&features={features}&queryFields={queryFields}&outputContentFormat={outputContentFormat}&output={output}
URI Parameters
| Name | In | Required | Type | Description |
|---|---|---|---|---|
|
endpoint
|
path | True |
string (uri) |
The Document Intelligence service endpoint. |
|
model
|
path | True |
string maxLength: 64pattern: ^[a-zA-Z0-9][a-zA-Z0-9._~-]{1,63}$ |
Unique document model name. |
|
api-version
|
query | True |
string minLength: 1 |
The API version to use for this operation. |
|
features
|
query |
List of optional analysis features. |
||
|
locale
|
query |
string |
Locale hint for text recognition and document analysis. Value may contain only the language code (ex. "en", "fr") or BCP 47 language tag (ex. "en-US"). |
|
|
output
|
query |
Additional outputs to generate during analysis. |
||
|
output
|
query |
Format of the analyze result top-level content. |
||
|
pages
|
query |
string pattern: ^(\d+(-\d+)?)(,\s*(\d+(-\d+)?))*$ |
1-based page numbers to analyze. Ex. "1-3,5,7-9" |
|
|
query
|
query |
string[] |
List of additional fields to extract. Ex. "NumberOfGuests,StoreNumber" |
|
|
string
|
query |
Method used to compute string offset and length. |
Request Body
| Name | Required | Type | Description |
|---|---|---|---|
| resultContainerUrl | True |
string (uri) |
Azure Blob Storage container URL where analyze result files will be stored. |
| azureBlobFileListSource |
Azure Blob Storage file list specifying the batch documents. Either azureBlobSource or azureBlobFileListSource must be specified. |
||
| azureBlobSource |
Azure Blob Storage location containing the batch documents. Either azureBlobSource or azureBlobFileListSource must be specified. |
||
| overwriteExisting |
boolean |
Overwrite existing analyze result files? |
|
| resultPrefix |
string |
Blob name prefix of result files. |
Responses
| Name | Type | Description |
|---|---|---|
| 202 Accepted |
The request has been accepted for processing, but processing has not yet completed. Headers
|
|
| Other Status Codes |
An unexpected error response. |
Security
Ocp-Apim-Subscription-Key
Type:
apiKey
In:
header
OAuth2Auth
Type:
oauth2
Flow:
accessCode
Authorization URL:
https://login.microsoftonline.com/common/oauth2/authorize
Token URL:
https://login.microsoftonline.com/common/oauth2/token
Scopes
| Name | Description |
|---|---|
| https://cognitiveservices.azure.com/.default |
Examples
Analyze Batch Documents
Sample request
POST https://myendpoint.cognitiveservices.azure.com/documentintelligence/documentModels/customModel:analyzeBatch?api-version=2024-11-30&pages=1-5&locale=en-US&stringIndexType=textElements
{
"azureBlobSource": {
"containerUrl": "https://myStorageAccount.blob.core.windows.net/myContainer?mySasToken",
"prefix": "trainingDocs/"
},
"resultContainerUrl": "https://myStorageAccount.blob.core.windows.net/myOutputContainer?mySasToken",
"resultPrefix": "trainingDocsResult/",
"overwriteExisting": true
}
Sample response
Operation-Location: https://myendpoint.cognitiveservices.azure.com/documentintelligence/documentModels/customModel/analyzeBatchResults/3b31320d-8bab-4f88-b19c-2322a7f11034?api-version=2024-02-29-preview
Definitions
| Name | Description |
|---|---|
|
Analyze |
Batch document analysis parameters. |
|
Analyze |
Additional outputs to generate during analysis. |
|
Azure |
Azure Blob Storage content. |
|
Azure |
File list in Azure Blob Storage. |
|
Document |
Document analysis features to enable. |
|
Document |
Format of the content in analyzed result. |
|
Document |
The error object. |
|
Document |
Error response object. |
|
Document |
An object containing more specific information about the error. |
|
String |
Method used to compute string offset and length. |
AnalyzeBatchDocumentsRequest
Batch document analysis parameters.
| Name | Type | Default value | Description |
|---|---|---|---|
| azureBlobFileListSource |
Azure Blob Storage file list specifying the batch documents. Either azureBlobSource or azureBlobFileListSource must be specified. |
||
| azureBlobSource |
Azure Blob Storage location containing the batch documents. Either azureBlobSource or azureBlobFileListSource must be specified. |
||
| overwriteExisting |
boolean |
False |
Overwrite existing analyze result files? |
| resultContainerUrl |
string (uri) |
Azure Blob Storage container URL where analyze result files will be stored. |
|
| resultPrefix |
string |
Blob name prefix of result files. |
AnalyzeOutputOption
Additional outputs to generate during analysis.
| Value | Description |
|---|---|
|
Generate searchable PDF output. |
|
| figures |
Generate cropped images of detected figures. |
AzureBlobContentSource
Azure Blob Storage content.
| Name | Type | Description |
|---|---|---|
| containerUrl |
string (uri) |
Azure Blob Storage container URL. |
| prefix |
string |
Blob name prefix. |
AzureBlobFileListContentSource
File list in Azure Blob Storage.
| Name | Type | Description |
|---|---|---|
| containerUrl |
string (uri) |
Azure Blob Storage container URL. |
| fileList |
string |
Path to a JSONL file within the container specifying a subset of documents. |
DocumentAnalysisFeature
Document analysis features to enable.
| Value | Description |
|---|---|
| ocrHighResolution |
Perform OCR at a higher resolution to handle documents with fine print. |
| languages |
Enable the detection of the text content language. |
| barcodes |
Enable the detection of barcodes in the document. |
| formulas |
Enable the detection of mathematical expressions in the document. |
| keyValuePairs |
Enable the detection of general key value pairs (form fields) in the document. |
| styleFont |
Enable the recognition of various font styles. |
| queryFields |
Enable the extraction of additional fields via the queryFields query parameter. |
DocumentContentFormat
Format of the content in analyzed result.
| Value | Description |
|---|---|
| text |
Plain text representation of the document content without any formatting. |
| markdown |
Markdown representation of the document content with section headings, tables, etc. |
DocumentIntelligenceError
The error object.
| Name | Type | Description |
|---|---|---|
| code |
string |
One of a server-defined set of error codes. |
| details |
An array of details about specific errors that led to this reported error. |
|
| innererror |
An object containing more specific information than the current object about the error. |
|
| message |
string |
A human-readable representation of the error. |
| target |
string |
The target of the error. |
DocumentIntelligenceErrorResponse
Error response object.
| Name | Type | Description |
|---|---|---|
| error |
Error info. |
DocumentIntelligenceInnerError
An object containing more specific information about the error.
| Name | Type | Description |
|---|---|---|
| code |
string |
One of a server-defined set of error codes. |
| innererror |
Inner error. |
|
| message |
string |
A human-readable representation of the error. |
StringIndexType
Method used to compute string offset and length.
| Value | Description |
|---|---|
| textElements |
User-perceived display character, or grapheme cluster, as defined by Unicode 8.0.0. |
| unicodeCodePoint |
Character unit represented by a single unicode code point. Used by Python 3. |
| utf16CodeUnit |
Character unit represented by a 16-bit Unicode code unit. Used by JavaScript, Java, and .NET. |