Optimize Pod autoscaling based on metrics

This tutorial demonstrates how to automatically scale your Google Kubernetes Engine (GKE) workloads based on metrics available in Cloud Monitoring.

In this tutorial, you can set up autoscaling based on one of the following metrics:

Pub/Sub

Pub/Sub backlog

Scale based on an external metric reporting the number of unacknowledged messages remaining in a Pub/Sub subscription. This can effectively reduce latency before it becomes a problem, but might use relatively more resources than autoscaling based on CPU utilization.

Custom Metric

Custom Prometheus Metric

Scale based on a custom user-defined metric, exported in the Prometheus format via Google Managed Prometheus. Your Prometheus metric must be of type Gauge.

Autoscaling is fundamentally about finding an acceptable balance between cost and latency. You might want to experiment with a combination of these metrics and others to find a policy that works for you.

Objectives

This tutorial covers the following tasks:

How to deploy the Custom Metrics Adapter.
How to export metrics from within your application code.
How to view your metrics on the Cloud Monitoring interface.
How to deploy a HorizontalPodAutoscaler (HPA) resource to scale your application based on Cloud Monitoring metrics.

Costs

In this document, you use the following billable components of Google Cloud:

GKE
Pub/Sub

To generate a cost estimate based on your projected usage, use the pricing calculator.

New Google Cloud users might be eligible for a free trial.

When you finish the tasks that are described in this document, you can avoid continued billing by deleting the resources that you created. For more information, see Clean up.

Before you begin

Take the following steps to enable the Kubernetes Engine API:

Visit the Kubernetes Engine page in the Google Cloud console.
Create or select a project.
Wait for the API and related services to be enabled. This can take several minutes.
Verify that billing is enabled for your Google Cloud project.

You can follow this tutorial using Cloud Shell, which comes preinstalled with the gcloud and kubectl command-line tools used in this tutorial. If you use Cloud Shell, you don't need to install these command-line tools on your workstation.

To use Cloud Shell:

Go to the Google Cloud console.
Click the Activate Cloud Shell Activate Shell Button button at the top of the Google Cloud console window.

A Cloud Shell session opens inside a new frame at the bottom of the Google Cloud console and displays a command-line prompt.

Cloud Shell session

Setting up your environment

Set the default zone for the Google Cloud CLI:
```
gcloud config set compute/zone zone
```
Replace the following:
- zone: Choose a zone that's closest to you. For more information, see Regions and Zones.

Set the PROJECT_ID and PROJECT_NUMBER environment variables to your Google Cloud project ID and project number:

export PROJECT_ID=project-id
export PROJECT_NUMBER=$(gcloud projects describe $PROJECT_ID --format 'get(projectNumber)')

Set the default zone for the Google Cloud CLI:
```
gcloudconfigsetproject$PROJECT_ID
```
Create a GKE cluster

Best practice:
For enhanced security when accessing Google Cloud services, enable Workload Identity Federation for GKE on your cluster. Although this page includes examples using the legacy method (with Workload Identity Federation for GKE disabled), enabling it improves protection.
Workload Identity
To create a cluster with Workload Identity Federation for GKE enabled, run the following command:
```
gcloudcontainerclusterscreatemetrics-autoscaling--workload-pool=$PROJECT_ID.svc.id.goog
```
Legacy authentication
To create a cluster with Workload Identity Federation for GKE disabled, run the following command:
```
gcloudcontainerclusterscreatemetrics-autoscaling
```

Deploying the Custom Metrics Adapter

The Custom Metrics Adapter lets your cluster send and receive metrics with Cloud Monitoring.

Pub/Sub

The procedure to install the Custom Metrics Adapter differs for clusters with or without Workload Identity Federation for GKE enabled. Select the option matching the setup you chose when you created your cluster.

Workload Identity

Grant your user the ability to create required authorization roles:

kubectlcreateclusterrolebindingcluster-admin-binding\
--clusterrolecluster-admin--user"$(gcloudconfigget-valueaccount)"

Deploy the custom metrics adapter on your cluster:

kubectlapply-fhttps://raw.githubusercontent.com/GoogleCloudPlatform/k8s-stackdriver/master/custom-metrics-stackdriver-adapter/deploy/production/adapter_new_resource_model.yaml

The adapter uses the custom-metrics-stackdriver-adapter Kubernetes service account in the custom-metrics namespace. Allow this service account to read Cloud Monitoring metrics by assigning the Monitoring Viewer role:

gcloudprojectsadd-iam-policy-bindingprojects/$PROJECT_ID\
--roleroles/monitoring.viewer\
--member=principal://iam.googleapis.com/projects/$PROJECT_NUMBER/locations/global/workloadIdentityPools/$PROJECT_ID.svc.id.goog/subject/ns/custom-metrics/sa/custom-metrics-stackdriver-adapter

Legacy Authentication

Grant your user the ability to create required authorization roles:

kubectlcreateclusterrolebindingcluster-admin-binding\
--clusterrolecluster-admin--user"$(gcloudconfigget-valueaccount)"

Deploy the custom metrics adapter on your cluster:

kubectlapply-fhttps://raw.githubusercontent.com/GoogleCloudPlatform/k8s-stackdriver/master/custom-metrics-stackdriver-adapter/deploy/production/adapter_new_resource_model.yaml

Custom Metric

Workload Identity

Grant your user the ability to create required authorization roles:

kubectlcreateclusterrolebindingcluster-admin-binding\
--clusterrolecluster-admin--user"$(gcloudconfigget-valueaccount)"

Deploy the custom metrics adapter on your cluster:

kubectlapply-fhttps://raw.githubusercontent.com/GoogleCloudPlatform/k8s-stackdriver/master/custom-metrics-stackdriver-adapter/deploy/production/adapter_new_resource_model.yaml

gcloudprojectsadd-iam-policy-bindingprojects/$PROJECT_ID\
--roleroles/monitoring.viewer\
--member=principal://iam.googleapis.com/projects/$PROJECT_NUMBER/locations/global/workloadIdentityPools/$PROJECT_ID.svc.id.goog/subject/ns/custom-metrics/sa/custom-metrics-stackdriver-adapter

Legacy Authentication

Grant your user the ability to create required authorization roles:

kubectlcreateclusterrolebindingcluster-admin-binding\
--clusterrolecluster-admin--user"$(gcloudconfigget-valueaccount)"

Deploy the custom metrics adapter on your cluster:

kubectlapply-fhttps://raw.githubusercontent.com/GoogleCloudPlatform/k8s-stackdriver/master/custom-metrics-stackdriver-adapter/deploy/production/adapter_new_resource_model.yaml

Deploying an application with metrics

Download the repository containing the application code for this tutorial:

Pub/Sub

gitclonehttps://github.com/GoogleCloudPlatform/kubernetes-engine-samples.git
cdkubernetes-engine-samples/databases/cloud-pubsub

Custom Metric

gitclonehttps://github.com/GoogleCloudPlatform/kubernetes-engine-samples.git
cdkubernetes-engine-samples/observability/custom-metrics-autoscaling/google-managed-prometheus

The repository contains code that exports metrics to Cloud Monitoring:

Pub/Sub

This application polls a Pub/Sub subscription for new messages, acknowledging them as they arrive. Pub/Sub subscription metrics are automatically collected by Cloud Monitoring.

fromgoogleimport auth
fromgoogle.cloudimport pubsub_v1
defmain():
"""Continuously pull messages from subsciption"""
 # read default project ID
 _, project_id = auth.default()
 subscription_id = 'echo-read'
 subscriber = pubsub_v1.SubscriberClient ()
 subscription_path = subscriber.subscription_path(
 project_id, subscription_id)
 defcallback(message: pubsub_v1.subscriber.message.Message ) -> None:
"""Process received message"""
 print(f"Received message: ID={message.message_id} Data={message.data }")
 print(f"[{datetime.datetime.now()}] Processing: {message.message_id}")
 time.sleep(3)
 print(f"[{datetime.datetime.now()}] Processed: {message.message_id}")
 message.ack ()
 streaming_pull_future = subscriber.subscribe (
 subscription_path, callback=callback)
 print(f"Pulling messages from {subscription_path}...")
 with subscriber:
 try:
 streaming_pull_future.result()
 except Exception as e:
 print(e)

Custom Metric

This application responds to any web request to the /metrics path with a constant value metric using the Prometheus format.

metric:=prometheus.NewGauge(
prometheus.GaugeOpts{
Name:*metricName,
Help:"Custom metric",
},
)
prometheus.MustRegister(metric)
metric.Set(float64(*metricValue))
http.Handle("/metrics",promhttp.Handler())
log.Printf("Starting to listen on :%d",*port)
err:=http.ListenAndServe(fmt.Sprintf(":%d",*port),nil)

The repository also contains a Kubernetes manifest to deploy the application to your cluster. A Deployment is a Kubernetes API object that lets you run multiple replicas of Pods that are distributed among the nodes in a cluster.:

Pub/Sub

The manifest differs for clusters with or without Workload Identity Federation for GKE enabled. Select the option matching the setup chose when you created your cluster.

Workload Identity

apiVersion:apps/v1
kind:Deployment
metadata:
name:pubsub
spec:
selector:
matchLabels:
app:pubsub
template:
metadata:
labels:
app:pubsub
spec:
serviceAccountName:pubsub-sa
containers:
-name:subscriber
image:us-docker.pkg.dev/google-samples/containers/gke/pubsub-sample:v2

Legacy authentication

apiVersion:apps/v1
kind:Deployment
metadata:
name:pubsub
spec:
selector:
matchLabels:
app:pubsub
template:
metadata:
labels:
app:pubsub
spec:
volumes:
-name:google-cloud-key
secret:
secretName:pubsub-key
containers:
-name:subscriber
image:us-docker.pkg.dev/google-samples/containers/gke/pubsub-sample:v2
volumeMounts:
-name:google-cloud-key
mountPath:/var/secrets/google
env:
-name:GOOGLE_APPLICATION_CREDENTIALS
value:/var/secrets/google/key.json

Custom Metric

apiVersion:apps/v1
kind:Deployment
metadata:
labels:
run:custom-metrics-gmp
name:custom-metrics-gmp
namespace:default
spec:
replicas:1
selector:
matchLabels:
run:custom-metrics-gmp
template:
metadata:
labels:
run:custom-metrics-gmp
spec:
containers:
# sample container generating custom metrics
-name:prometheus-dummy-exporter
image:us-docker.pkg.dev/google-samples/containers/gke/prometheus-dummy-exporter:v0.2.0
command:["./prometheus-dummy-exporter"]
args:
---metric-name=custom_prometheus
---metric-value=40
---port=8080

With the PodMonitoring resource, the Google Cloud Managed Service for Prometheus exports the Prometheus metrics to Cloud Monitoring:

apiVersion:monitoring.googleapis.com/v1
kind:PodMonitoring
metadata:
name:"custom-metrics-exporter"
spec:
selector:
matchLabels:
run:custom-metrics-gmp
endpoints:
-port:8080
path:/metrics
interval:15s

Starting in GKE Standard version 1.27 or GKE Autopilot version 1.25, Google Cloud Managed Service for Prometheus is enabled. To enable Google Cloud Managed Service for Prometheus in clusters in earlier versions, see Enable managed collection.

Deploy the application to your cluster:

Pub/Sub

The procedure to deploy your application differs for clusters with or without Workload Identity Federation for GKE enabled. Select the option matching the setup you chose when you created your cluster.

Workload Identity

Enable the Pub/Sub API on your project:

gcloudservicesenablecloudresourcemanager.googleapis.compubsub.googleapis.com

Create a Pub/Sub topic and subscription:

gcloudpubsubtopicscreateecho
gcloudpubsubsubscriptionscreateecho-read--topic=echo

Deploy the application to your cluster:

kubectlapply-fdeployment/pubsub-with-workload-identity.yaml

This application defines a pubsub-sa Kubernetes service account. Assign it the Pub/Sub subscriber role so that the application can publish messages to the Pub/Sub topic.
```
gcloudprojectsadd-iam-policy-bindingprojects/$PROJECT_ID\
--role=roles/pubsub.subscriber\
--member=principal://iam.googleapis.com/projects/$PROJECT_NUMBER/locations/global/workloadIdentityPools/$PROJECT_ID.svc.id.goog/subject/ns/default/sa/pubsub-sa
```
The preceding command uses a Principal Identifier, which allows IAM to directly refer to a Kubernetes service account.

Best practice:
Use Principal identifiers, but consider the limitation in the description of an alternative method.

Legacy authentication

Enable the Pub/Sub API on your project:

gcloudservicesenablecloudresourcemanager.googleapis.compubsub.googleapis.com

Create a Pub/Sub topic and subscription:

gcloudpubsubtopicscreateecho
gcloudpubsubsubscriptionscreateecho-read--topic=echo

Create a service account with access to Pub/Sub:

gcloudiamservice-accountscreateautoscaling-pubsub-sa
gcloudprojectsadd-iam-policy-binding$PROJECT_ID\
--member"serviceAccount:autoscaling-pubsub-sa@$PROJECT_ID.iam.gserviceaccount.com"\
--role"roles/pubsub.subscriber"

Download the service account key file:

gcloudiamservice-accountskeyscreatekey.json\
--iam-accountautoscaling-pubsub-sa@$PROJECT_ID.iam.gserviceaccount.com

Import the service account key to your cluster as a Secret:

kubectlcreatesecretgenericpubsub-key--from-file=key.json=./key.json

Deploy the application to your cluster:

kubectlapply-fdeployment/pubsub-with-secret.yaml

Custom Metric

kubectlapply-fcustom-metrics-gmp.yaml

After waiting a moment for the application to deploy, all Pods reach the Ready state:

Pub/Sub

kubectlgetpods

Output:

NAME READY STATUS RESTARTS AGE
pubsub-8cd995d7c-bdhqz 1/1 Running 0 58s

Custom Metric

kubectlgetpods

Output:

NAME READY STATUS RESTARTS AGE
custom-metrics-gmp-865dffdff9-x2cg9 1/1 Running 0 49s

Viewing metrics on Cloud Monitoring

As your application runs, it writes your metrics to Cloud Monitoring.

To view the metrics for a monitored resource by using the Metrics Explorer, do the following:

In the Google Cloud console, go to the Metrics explorer page:
Go to Metrics explorer

If you use the search bar to find this page, then select the result whose subheading is Monitoring.
In the Metric element, expand the Select a metric menu, and then select a resource type and metric type. For example, to chart the CPU utilization of a virtual machine, do the following:
1. (Optional) To reduce the menu's options, enter part of the metric name in the Filter bar. For this example, enter utilization.
2. In the Active resources menu, select VM instance.
3. In the Active metric categories menu, select Instance.
4. In the Active metrics menu, select CPU utilization and then click Apply.
To filter which time series are displayed, use the Filter element.
To combine time series, use the menus on the Aggregation element. For example, to display the CPU utilization for your VMs, based on their zone, set the first menu to Mean and the second menu to zone.

All time series are displayed when the first menu of the Aggregation element is set to Unaggregated. The default settings for the Aggregation element are determined by the metric type you selected.

The resource type and metrics are the following:

Pub/Sub

Metrics Explorer

Resource type: pubsub_subscription

Metric: pubsub.googleapis.com/subscription/num_undelivered_messages

Custom Metric

Metrics Explorer

Resource type: prometheus_target

Metric: prometheus.googleapis.com/custom_prometheus/gauge

Depending on the metric, you might not see much activity on the Cloud Monitoring Metrics Explorer yet. Don't be surprised if your metric isn't updating.

Creating a HorizontalPodAutoscaler object

When you see your metric in Cloud Monitoring, you can deploy a HorizontalPodAutoscaler to resize your Deployment based on your metric.

Pub/Sub

apiVersion:autoscaling/v2
kind:HorizontalPodAutoscaler
metadata:
name:pubsub
spec:
minReplicas:1
maxReplicas:5
metrics:
-external:
metric:
name:pubsub.googleapis.com|subscription|num_undelivered_messages
selector:
matchLabels:
resource.labels.subscription_id:echo-read
target:
type:AverageValue
averageValue:2
type:External
scaleTargetRef:
apiVersion:apps/v1
kind:Deployment
name:pubsub

Custom Metric

apiVersion:autoscaling/v2
kind:HorizontalPodAutoscaler
metadata:
name:custom-metrics-gmp-hpa
namespace:default
spec:
scaleTargetRef:
apiVersion:apps/v1
kind:Deployment
name:custom-metrics-gmp
minReplicas:1
maxReplicas:5
metrics:
-type:Pods
pods:
metric:
name:prometheus.googleapis.com|custom_prometheus|gauge
target:
type:AverageValue
averageValue:20

Deploy the HorizontalPodAutoscaler to your cluster:

Pub/Sub

kubectlapply-fdeployment/pubsub-hpa.yaml

Custom Metric

kubectlapply-fcustom-metrics-gmp-hpa.yaml

Generating load

For some metrics, you might need to generate load to watch the autoscaling:

Pub/Sub

Publish 200 messages to the Pub/Sub topic:

foriin{1..200};dogcloudpubsubtopicspublishecho--message="Autoscaling #${i}";done

Custom Metric

Not Applicable: The code used in this sample exports a constant value of 40 for the custom metric. The HorizontalPodAutoscaler is set with a target value of 20, so it attempts to scale up the Deployment automatically.

You might need to wait a couple minutes for the HorizontalPodAutoscaler to respond to the metric changes.

Observing HorizontalPodAutoscaler scaling up

You can check the current number of replicas of your Deployment by running:

kubectlgetdeployments

After giving some time for the metric to propagate, the Deployment creates five Pods to handle the backlog.

You can also inspect the state and recent activity of the HorizontalPodAutoscaler by running:

kubectldescribehpa

Clean up

To avoid incurring charges to your Google Cloud account for the resources used in this tutorial, either delete the project that contains the resources, or keep the project and delete the individual resources.

Pub/Sub

Clean up the Pub/Sub subscription and topic:

gcloudpubsubsubscriptionsdeleteecho-read
gcloudpubsubtopicsdeleteecho

Delete your GKE cluster:

gcloudcontainerclustersdeletemetrics-autoscaling

Custom Metric

Delete your GKE cluster:

gcloudcontainerclustersdeletemetrics-autoscaling

What's next

Learn more about custom and external metrics for scaling workloads.
Explore other Kubernetes Engine tutorials.