Normalization
Stay organized with collections
Save and categorize content based on your preferences.
For many specific supported fields, Document AI also returns an
entity.normalizedValue
in addition to the raw extracted field obtained through the textAnchor of each
entity. It normalize the literal text. Normalization often breaks the text value
up into sub-fields.
This contain the data in a standardized format to reduce post processing, and
enable conversion to whatever format is selected. The mentionText, representing
what is literally on the document, is never changed by normalization.
Normalized fields belong to one of the following categories.
Normalized values in the console
In the Google Cloud console, the normalized fields are annotated with G. For example:
Supported processors
Here are the processors and fields that support entity enrichment and normalization:
| Processors | Normalized fields |
|---|---|
Bank Statement Parser Category
Pretrained
Solution type
Lending
Functions
OCR, Entity Extraction
Release stage
General availability
Access status
Public
Full processor details
Detailed entry
|
|
US Passport Parser Category
Pretrained
Solution type
Identity
Functions
OCR, Entity Extraction
Release stage
General availability
Access status
Public
Full processor details
Detailed entry
|
|
Utility Parser Category
Pretrained
Solution type
Procurement
Functions
OCR, Entity Extraction
Release stage
General availability
Access status
Limited
Full processor details
Detailed entry
|
|
Identity Document Proofing Parser Category
Pretrained
Solution type
Identity
Functions
OCR, Quality Analysis
Release stage
General availability
Access status
Public
Full processor details
Detailed entry
|
|
Pay Slip Parser Category
Pretrained
Solution type
Lending
Functions
OCR, Entity Extraction
Release stage
General availability
Access status
Public
Full processor details
Detailed entry
|
|
US Driver License Parser Category
Pretrained
Solution type
Identity
Functions
OCR, Entity Extraction
Release stage
General availability
Access status
Public
Full processor details
Detailed entry
|
|
Expense Parser Category
Pretrained
Solution type
Procurement
Functions
OCR, Entity Extraction
Release stage
General availability
Access status
Public
Full processor details
Detailed entry
|
|
Invoice Parser Category
Pretrained
Solution type
Procurement
Functions
OCR, Entity Extraction
Release stage
General availability
Access status
Public
Full processor details
Detailed entry
|
|
Extraction processors
Custom extractor supports normalization of all entities with the following Google Cloud
common data types: dateTime, currency, money,
and number.
| Processors | Normalized data types |
|---|---|
Custom Extractor Category
Extract
Solution type
Custom
Functions
OCR, Entity Extraction
Release stage
General availability
Access status
Public
Full processor details
Detailed entry
|
|