Module vision_models (1.93.1)

Classes for working with vision models.

Classes

ControlImageConfig

ControlImageConfig(
 control_type: typing.Literal[
 "CONTROL_TYPE_DEFAULT",
 "CONTROL_TYPE_SCRIBBLE",
 "CONTROL_TYPE_FACE_MESH",
 "CONTROL_TYPE_CANNY",
 ],
 enable_control_image_computation: typing.Optional[bool] = False,
)

Control image config.

ControlReferenceImage

ControlReferenceImage(
 reference_id,
 image: typing.Optional[
 typing.Union[bytes, vertexai.vision_models.Image, str]
 ] = None,
 control_type: typing.Optional[
 typing.Literal["default", "scribble", "face_mesh", "canny"]
 ] = None,
 enable_control_image_computation: typing.Optional[bool] = False,
)

Control reference image.

This encapsulates the control reference image type.

EntityLabel

EntityLabel(
 label: typing.Optional[str] = None, score: typing.Optional[float] = None
)

Entity label holding a text label and any associated confidence score.

GeneratedImage

GeneratedImage(
 image_bytes: typing.Optional[bytes],
 generation_parameters: typing.Dict[str, typing.Any],
 gcs_uri: typing.Optional[str] = None,
)

Generated image.

GeneratedMask

GeneratedMask(
 image_bytes: typing.Optional[bytes],
 gcs_uri: typing.Optional[str] = None,
 labels: typing.Optional[
 typing.List[vertexai.preview.vision_models.EntityLabel]
 ] = None,
)

Generated image mask.

Image

Image(
 image_bytes: typing.Optional[bytes] = None, gcs_uri: typing.Optional[str] = None
)

Image.

ImageCaptioningModel

ImageCaptioningModel(model_id: str, endpoint_name: typing.Optional[str] = None)

Generates captions from image.

Examples::

model = ImageCaptioningModel.from_pretrained("imagetext@001")
image = Image.load_from_file("image.png")
captions = model.get_captions(
 image=image,
 # Optional:
 number_of_results=1,
 language="en",
)

ImageGenerationModel

ImageGenerationModel(model_id: str, endpoint_name: typing.Optional[str] = None)

Generates images from text prompt.

Examples::

model = ImageGenerationModel.from_pretrained("imagegeneration@002")
response = model.generate_images(
 prompt="Astronaut riding a horse",
 # Optional:
 number_of_images=1,
 seed=0,
)
response[0].show()
response[0].save("image1.png")

ImageGenerationResponse

ImageGenerationResponse(images: typing.List[GeneratedImage])

Image generation response.

ImageQnAModel

ImageQnAModel(model_id: str, endpoint_name: typing.Optional[str] = None)

Answers questions about an image.

Examples::

model = ImageQnAModel.from_pretrained("imagetext@001")
image = Image.load_from_file("image.png")
answers = model.ask_question(
 image=image,
 question="What color is the car in this image?",
 # Optional:
 number_of_results=1,
)

ImageSegmentationModel

ImageSegmentationModel(model_id: str, endpoint_name: typing.Optional[str] = None)

Segments an image.

ImageSegmentationResponse

ImageSegmentationResponse(
 _prediction_response: typing.Any,
 masks: typing.List[vertexai.preview.vision_models.GeneratedMask],
)

Image Segmentation response.

ImageTextModel

ImageTextModel(model_id: str, endpoint_name: typing.Optional[str] = None)

Generates text from images.

Examples::

model = ImageTextModel.from_pretrained("imagetext@001")
image = Image.load_from_file("image.png")
captions = model.get_captions(
 image=image,
 # Optional:
 number_of_results=1,
 language="en",
)
answers = model.ask_question(
 image=image,
 question="What color is the car in this image?",
 # Optional:
 number_of_results=1,
)

MaskImageConfig

MaskImageConfig(
 mask_mode: typing.Literal[
 "MASK_MODE_DEFAULT",
 "MASK_MODE_USER_PROVIDED",
 "MASK_MODE_BACKGROUND",
 "MASK_MODE_FOREGROUND",
 "MASK_MODE_SEMANTIC",
 ],
 segmentation_classes: typing.Optional[typing.List[int]] = None,
 dilation: typing.Optional[float] = None,
)

Mask image config.

MaskReferenceImage

MaskReferenceImage(
 reference_id,
 image: typing.Optional[
 typing.Union[bytes, vertexai.vision_models.Image, str]
 ] = None,
 mask_mode: typing.Optional[
 typing.Literal[
 "default", "user_provided", "background", "foreground", "semantic"
 ]
 ] = None,
 dilation: typing.Optional[float] = None,
 segmentation_classes: typing.Optional[typing.List[int]] = None,
)

Mask reference image. This encapsulates the mask reference image type.

MultiModalEmbeddingModel

MultiModalEmbeddingModel(model_id: str, endpoint_name: typing.Optional[str] = None)

Generates embedding vectors from images and videos.

Examples::

model = MultiModalEmbeddingModel.from_pretrained("multimodalembedding@001")
image = Image.load_from_file("image.png")
video = Video.load_from_file("video.mp4")
embeddings = model.get_embeddings(
 image=image,
 video=video,
 contextual_text="Hello world",
)
image_embedding = embeddings.image_embedding
video_embeddings = embeddings.video_embeddings
text_embedding = embeddings.text_embedding

MultiModalEmbeddingResponse

MultiModalEmbeddingResponse(
 _prediction_response: typing.Any,
 image_embedding: typing.Optional[typing.List[float]] = None,
 video_embeddings: typing.Optional[
 typing.List[vertexai.vision_models.VideoEmbedding]
 ] = None,
 text_embedding: typing.Optional[typing.List[float]] = None,
)

The multimodal embedding response.

RawReferenceImage

RawReferenceImage(
 reference_id,
 image: typing.Optional[
 typing.Union[bytes, vertexai.vision_models.Image, str]
 ] = None,
)

Raw reference image.

This encapsulates the raw reference image type.

ReferenceImage

ReferenceImage(
 reference_id,
 image: typing.Optional[
 typing.Union[bytes, vertexai.vision_models.Image, str]
 ] = None,
)

Reference image.

This is a new base API object for Imagen 3.0 Capabilities.

Scribble

Scribble(image_bytes: typing.Optional[bytes], gcs_uri: typing.Optional[str] = None)

Input scribble for image segmentation.

StyleImageConfig

StyleImageConfig(style_description: str)

Style image config.

StyleReferenceImage

StyleReferenceImage(
 reference_id,
 image: typing.Optional[
 typing.Union[bytes, vertexai.vision_models.Image, str]
 ] = None,
 style_description: typing.Optional[str] = None,
)

Style reference image. This encapsulates the style reference image type.

SubjectImageConfig

SubjectImageConfig(
 subject_description: str,
 subject_type: typing.Literal[
 "SUBJECT_TYPE_DEFAULT",
 "SUBJECT_TYPE_PERSON",
 "SUBJECT_TYPE_ANIMAL",
 "SUBJECT_TYPE_PRODUCT",
 ],
)

Subject image config.

SubjectReferenceImage

SubjectReferenceImage(
 reference_id,
 image: typing.Optional[
 typing.Union[bytes, vertexai.vision_models.Image, str]
 ] = None,
 subject_description: typing.Optional[str] = None,
 subject_type: typing.Optional[
 typing.Literal["default", "person", "animal", "product"]
 ] = None,
)

Subject reference image.

This encapsulates the subject reference image type.

Video

Video(
 video_bytes: typing.Optional[bytes] = None, gcs_uri: typing.Optional[str] = None
)

Video.

VideoEmbedding

VideoEmbedding(
 start_offset_sec: int, end_offset_sec: int, embedding: typing.List[float]
)

Embeddings generated from video with offset times.

VideoSegmentConfig

VideoSegmentConfig(
 start_offset_sec: int = 0, end_offset_sec: int = 120, interval_sec: int = 16
)

The specific video segments (in seconds) the embeddings are generated for.

WatermarkVerificationModel

WatermarkVerificationModel(
 model_id: str, endpoint_name: typing.Optional[str] = None
)

Verifies if an image has a watermark.

WatermarkVerificationResponse

WatermarkVerificationResponse(
 _prediction_response: Any, watermark_verification_result: Optional[str] = None
)

WatermarkVerificationResponse(_prediction_response: Any, watermark_verification_result: Optional[str] = None)

Except as otherwise noted, the content of this page is licensed under the Creative Commons Attribution 4.0 License, and code samples are licensed under the Apache 2.0 License. For details, see the Google Developers Site Policies. Java is a registered trademark of Oracle and/or its affiliates.

Last updated 2025年10月30日 UTC.