Module gcs_utilities (0.6.0a0)

Document AI utilities.

Modules Functions

_get_client_info

_get_client_info(module: Optional[str] = None)

Returns a custom user agent header.

_get_storage_client

_get_storage_client(module: Optional[str] = None)

Returns a Storage client with custom user agent header.

create_batches

create_batches(gcs_bucket_name: str, gcs_prefix: str, batch_size: int = 1000)

Create batches of documents in Cloud Storage to process with batch_process_documents().

Parameters
Name Description
gcs_bucket_name str

Required. The name of the gcs bucket. Format: gs://bucket/optional_folder/target_folder/ where gcs_bucket_name=bucket.

gcs_prefix str

Required. The prefix of the json files in the target_folder Format: gs://bucket/optional_folder/target_folder/ where gcs_prefix=optional_folder/target_folder.

batch_size int

Optional. Size of each batch of documents. Default is 50.

Returns
Type Description
List[documentai.BatchDocumentsInputConfig] A list of BatchDocumentsInputConfig, each corresponding to one batch.

create_gcs_uri

create_gcs_uri(gcs_bucket_name: str, gcs_prefix: str)

Creates a Cloud Storage uri from the bucket_name and prefix.

Parameters
Name Description
gcs_bucket_name str

Required. The name of the gcs bucket. Format: gs://{bucket_name}/{optional_folder}/{target_folder}/ where gcs_bucket_name=bucket.

gcs_prefix str

Required. The prefix of the files in the target_folder. Format: gs://{bucket_name}/{optional_folder}/{target_folder}/ where gcs_prefix={optional_folder}/{target_folder}.

get_bytes

get_bytes(gcs_bucket_name: str, gcs_prefix: str)

Returns a list of bytes of json files from Cloud Storage.

Parameters
Name Description
gcs_bucket_name str

Required. The name of the gcs bucket. Format: gs://{bucket_name}/{optional_folder}/{target_folder}/ where gcs_bucket_name=bucket.

gcs_prefix str

Required. The prefix of the json files in the target_folder Format: gs://{bucket_name}/{optional_folder}/{target_folder}/ where gcs_prefix={optional_folder}/{target_folder}.

Returns
Type Description
List[bytes] A list of bytes.

list_gcs_document_tree

list_gcs_document_tree(gcs_bucket_name: str, gcs_prefix: str)

Returns a list path to files in Cloud Storage folder.

Parameters
Name Description
gcs_bucket_name str

Required. The name of the gcs bucket. Format: gs://{bucket_name}/{optional_folder}/{target_folder}/ where gcs_bucket_name=bucket.

gcs_prefix str

Required. The prefix of the json files in the target_folder. Format: gs://{bucket_name}/{optional_folder}/{target_folder}/ where gcs_prefix={optional_folder}/{target_folder}.

Returns
Type Description
Dict[str, List[str]] The paths to documents in gs://{gcs_bucket_name}/{gcs_prefix}.

print_gcs_document_tree

print_gcs_document_tree(
 gcs_bucket_name: str, gcs_prefix: str, files_to_display: int = 4
)

Prints a tree of filenames in a Cloud Storage folder.

Parameters
Name Description
gcs_bucket_name str

Required. The name of the gcs bucket. Format: gs://{bucket_name}/{optional_folder}/{target_folder}/ where gcs_bucket_name=bucket.

gcs_prefix str

Required. The prefix of the json files in the target_folder. Format: gs://{bucket_name}/{optional_folder}/{target_folder}/ where gcs_prefix={optional_folder}/{target_folder}.

files_to_display int

Optional. The amount of files to display. Default is 4.

split_gcs_uri

split_gcs_uri(gcs_uri: str)

Splits a Cloud Storage uri into the bucket_name and prefix.

Parameter
Name Description
gcs_uri str

Required. The full Cloud Storage URI. Format: gs://{bucket_name}/{gcs_prefix}.

Returns
Type Description
Tuple[str, str] The Cloud Storage Bucket and Prefix.

Except as otherwise noted, the content of this page is licensed under the Creative Commons Attribution 4.0 License, and code samples are licensed under the Apache 2.0 License. For details, see the Google Developers Site Policies. Java is a registered trademark of Oracle and/or its affiliates.

Last updated 2025年10月30日 UTC.