Module document (0.1.1a0)

Wrappers for Document AI Document type.

Classes

Document

Document(
 shards: List[google.cloud.documentai_v1.types.document.Document],
 gcs_bucket_name: Optional[str] = None,
 gcs_prefix: Optional[str] = None,
)

Represents a wrapped Document.

This class hides away the complexities of using Document protobuf response outputted by BatchProcessDocuments or ProcessDocument methods and implements convenient methods for searching and extracting information within the Document.

Optional. The name of the gcs bucket.

Format: gs://bucket/optional_folder/target_folder/ where gcs_bucket_name=bucket.

:type: Optional[str]

(List[Entity]): A list of Entities in the Document.

Modules Functions

_entities_from_shards

_entities_from_shards(
 shards: List[google.cloud.documentai_v1.types.document.Document],
)

Returns a list of Entities from a list of documentai.Document shards.

Parameter
Name Description
shards List[google.cloud.documentai.Document]

Required. List of document shards.

Returns
Type Description
List[Entity] a list of Entities.

_get_bytes

_get_bytes(gcs_bucket_name: str, gcs_prefix: str)

Returns a list of bytes of json files from Cloud Storage.

Parameters
Name Description
gcs_bucket_name str

Required. The name of the gcs bucket. Format: gs://bucket/optional_folder/target_folder/ where gcs_bucket_name=bucket.

gcs_prefix str

Required. The prefix of the json files in the target_folder Format: gs://bucket/optional_folder/target_folder/ where gcs_prefix=optional_folder/target_folder.

Returns
Type Description
List[bytes] A list of bytes.

_get_shards

_get_shards(gcs_bucket_name: str, gcs_prefix: str)

Returns a list of documentai.Document shards from a Cloud Storage folder.

Parameters
Name Description
gcs_bucket_name str

Required. The name of the gcs bucket. Format: gs://bucket/optional_folder/target_folder/ where gcs_bucket_name=bucket.

gcs_prefix str

Required. The prefix of the json files in the target_folder. Format: gs://bucket/optional_folder/target_folder/ where gcs_prefix=optional_folder/target_folder.

Returns
Type Description
List[google.cloud.documentai.Document] A list of documentai.Documents.

_get_storage_client

_get_storage_client()

Returns a Storage client with custom user agent header.

_pages_from_shards

_pages_from_shards(
 shards: List[google.cloud.documentai_v1.types.document.Document],
)

Returns a list of Pages from a list of documentai.Document shards.

Parameter
Name Description
shards List[google.cloud.documentai.Document]

Required. List of document shards.

Returns
Type Description
List[Page] A list of Pages.

print_gcs_document_tree

print_gcs_document_tree(gcs_bucket_name: str, gcs_prefix: str)

Prints a tree of filenames in Cloud Storage folder.

Parameters
Name Description
gcs_bucket_name str

Required. The name of the gcs bucket. Format: gs://bucket/optional_folder/target_folder/ where gcs_bucket_name=bucket.

gcs_prefix str

Required. The prefix of the json files in the target_folder. Format: gs://bucket/optional_folder/target_folder/ where gcs_prefix=optional_folder/target_folder.

Except as otherwise noted, the content of this page is licensed under the Creative Commons Attribution 4.0 License, and code samples are licensed under the Apache 2.0 License. For details, see the Google Developers Site Policies. Java is a registered trademark of Oracle and/or its affiliates.

Last updated 2025年10月30日 UTC.