Evaluate model performance
Stay organized with collections
Save and categorize content based on your preferences.
This sample code demonstrates how to evaluate the performance of a GenAI model. It showcases how to define the evaluation specification, evaluate the model, and retrieve the evaluation metrics.
Explore further
For detailed documentation that includes this code sample, see the following:
Code sample
Python
Before trying this sample, follow the Python setup instructions in the Vertex AI quickstart using client libraries. For more information, see the Vertex AI Python API reference documentation.
To authenticate to Vertex AI, set up Application Default Credentials. For more information, see Set up authentication for a local development environment.
importos
fromgoogle.authimport default
importvertexai
fromvertexai.preview.language_modelsimport (
EvaluationTextClassificationSpec ,
TextGenerationModel,
)
PROJECT_ID = os.getenv("GOOGLE_CLOUD_PROJECT")
defevaluate_model() -> object:
"""Evaluate the performance of a generative AI model."""
# Set credentials for the pipeline components used in the evaluation task
credentials, _ = default(scopes=["https://www.googleapis.com/auth/cloud-platform"])
vertexai .init(project=PROJECT_ID, location="us-central1", credentials=credentials)
# Create a reference to a generative AI model
model = TextGenerationModel.from_pretrained("text-bison@002")
# Define the evaluation specification for a text classification task
task_spec = EvaluationTextClassificationSpec(
ground_truth_data=[
"gs://cloud-samples-data/ai-platform/generative_ai/llm_classification_bp_input_prompts_with_ground_truth.jsonl"
],
class_names=["nature", "news", "sports", "health", "startups"],
target_column_name="ground_truth",
)
# Evaluate the model
eval_metrics = model.evaluate (task_spec=task_spec)
print(eval_metrics)
# Example response:
# ...
# PipelineJob run completed.
# Resource name: projects/123456789/locations/us-central1/pipelineJobs/evaluation-llm-classification-...
# EvaluationClassificationMetric(label_name=None, auPrc=0.53833705, auRoc=0.8...
return eval_metrics
What's next
To search and filter code samples for other Google Cloud products, see the Google Cloud sample browser.