Dynamically allocate devices to workloads with DRA

Standard

This page explains how to deploy dynamic resource allocation (DRA) workloads on your Google Kubernetes Engine clusters. You create a ResourceClaimTemplate to request hardware with DRA and then deploy a basic workload to demonstrate how Kubernetes flexibly allocates hardware on your Pods.

This page is intended for Application operators and Data engineers who run workloads like AI/ML or high performance computing (HPC).

About dynamic resource allocation

DRA is a built-in Kubernetes feature that lets you flexibly request, allocate, and share hardware in your cluster among Pods and containers. For more information, see About dynamic resource allocation.

About requesting devices with DRA

When you set up your GKE infrastructure for DRA, the DRA drivers on your nodes create DeviceClass objects in the cluster. A DeviceClass defines a category of devices, such as GPUs, that are available to request for workloads. A platform administrator can optionally deploy additional DeviceClasses that limit which devices you can request in specific workloads.

To request devices within a DeviceClass, you create one of the following objects:

ResourceClaim: A ResourceClaim lets a Pod or a user request hardware resources by filtering for certain parameters within a DeviceClass.
ResourceClaimTemplate: A ResourceClaimTemplate defines a template that Pods can use to automatically create new per-Pod ResourceClaims.

For more information about ResourceClaim and ResourceClaimTemplate objects, see When to use ResourceClaims and ResourceClaimTemplates.

The examples on this page use a basic ResourceClaimTemplate to request the specified device configuration. For more detailed information, see the ResourceClaimTemplateSpec Kubernetes documentation.

Limitations

Node auto-provisioning isn't supported.
Autopilot clusters don't support DRA.
You can't use the following GPU sharing features:
- Time-sharing GPUs
- Multi-instance GPUs
- Multi-process Service (MPS)

Requirements

To use DRA, your GKE version must be version 1.32.1-gke.1489001 or later.

You should also be familiar with the following requirements and limitations:

Before you begin

Before you start, make sure that you have performed the following tasks:

Enable the Google Kubernetes Engine API.

Enable Google Kubernetes Engine API

If you want to use the Google Cloud CLI for this task, install and then initialize the gcloud CLI. If you previously installed the gcloud CLI, get the latest version by running the gcloud components update command. Earlier gcloud CLI versions might not support running the commands in this document.
Note: For existing gcloud CLI installations, make sure to set the compute/region property. If you use primarily zonal clusters, set the compute/zone instead. By setting a default location, you can avoid errors in the gcloud CLI like the following: One of [--zone, --region] must be supplied: Please specify location. You might need to specify the location in certain commands if the location of your cluster differs from the default that you set.

Ensure that your GKE clusters are configured for DRA workloads.

Use DRA to deploy workloads

To request per-Pod device allocation, you first create a ResourceClaimTemplate that produces a ResourceClaim to describe your request for GPUs or TPUs, which Kubernetes uses as a template to create new ResourceClaim objects for each Pod in a workload. When you specify the ResourceClaimTemplate in a workload, Kubernetes allocates the requested resources and schedules the Pods on corresponding nodes.

GPU

Save the following manifest as claim-template.yaml:

apiVersion:resource.k8s.io/v1beta2
kind:ResourceClaimTemplate
metadata:
name:gpu-claim-template
spec:
spec:
devices:
requests:
-name:single-gpu
deviceClassName:gpu.nvidia.com
allocationMode:ExactCount
count:1

Create the ResourceClaimTemplate:
```
kubectlcreate-fclaim-template.yaml
```

To create a workload that references the ResourceClaimTemplate, save the following manifest as dra-gpu-example.yaml:

apiVersion:apps/v1
kind:Deployment
metadata:
name:dra-gpu-example
spec:
replicas:1
selector:
matchLabels:
app:dra-gpu-example
template:
metadata:
labels:
app:dra-gpu-example
spec:
containers:
-name:ctr
image:ubuntu:22.04
command:["bash","-c"]
args:["while[1];dodate;echo$(nvidia-smi-L||echoWaiting...);sleep60;done"]
resources:
claims:
-name:single-gpu
resourceClaims:
-name:single-gpu
resourceClaimTemplateName:gpu-claim-template
tolerations:
-key:"nvidia.com/gpu"
operator:"Exists"
effect:"NoSchedule"

Deploy the workload:
```
kubectlcreate-fdra-gpu-example.yaml
```

TPU

Save the following manifest as claim-template.yaml:

apiVersion:resource.k8s.io/v1beta2
kind:ResourceClaimTemplate
metadata:
name:tpu-claim-template
spec:
spec:
devices:
requests:
-name:all-tpus
deviceClassName:tpu.google.com
allocationMode:All

This ResourceClaimTemplate requests that GKE allocate an entire TPU node pool to every ResourceClaim.

Create the ResourceClaimTemplate:
```
kubectlcreate-fclaim-template.yaml
```

To create a workload that references the ResourceClaimTemplate, save the following manifest as dra-tpu-example.yaml:

apiVersion:apps/v1
kind:Deployment
metadata:
name:dra-tpu-example
spec:
replicas:1
selector:
matchLabels:
app:dra-tpu-example
template:
metadata:
labels:
app:dra-tpu-example
spec:
containers:
-name:ctr
image:ubuntu:22.04
command:
-/bin/sh
--c
-|
echo "Environment Variables:"
env
echo "Sleeping indefinitely..."
sleep infinity
resources:
claims:
-name:all-tpus
resourceClaims:
-name:all-tpus
resourceClaimTemplateName:tpu-claim-template
tolerations:
-key:"google.com/tpu"
operator:"Exists"
effect:"NoSchedule"

Deploy the workload:
```
kubectlcreate-fdra-tpu-example.yaml
```

Verify the hardware allocation

You can verify that your workloads have been allocated hardware by checking the ResourceClaim or by looking at the logs for your Pod.

GPU

Get the ResourceClaim associated with the workload that you deployed:
```
kubectlgetresourceclaims
```
The output should resemble the following:
```
NAMESTATEAGE
dra-gpu-example-64b75dc6b-x8bd6-single-gpu-jwwdhallocated,reserved9s
```

To get more details about the hardware assigned to the Pod, run the following command:

kubectldescriberesourceclaimsRESOURCECLAIM

Replace RESOURCECLAIM with the full name of the ResourceClaim that you got from the output of the previous step.

The output should resemble the following:

Name: dra-gpu-example-64b75dc6b-x8bd6-single-gpu-jwwdh
Namespace: default
Labels: <none>
Annotations: resource.kubernetes.io/pod-claim-name: single-gpu
API Version: resource.k8s.io/v1beta1
Kind: ResourceClaim
Metadata:
 Creation Timestamp: 2025年03月31日T17:11:37Z
 Finalizers:
 resource.kubernetes.io/delete-protection
 Generate Name: dra-gpu-example-64b75dc6b-x8bd6-single-gpu-
 Owner References:
 API Version: v1
 Block Owner Deletion: true
 Controller: true
 Kind: Pod
 Name: dra-gpu-example-64b75dc6b-x8bd6
 UID: cb3cb1db-e62a-4961-9967-cdc7d599105b
 Resource Version: 12953269
 UID: 3e0c3925-e15a-40e9-b552-d03610fff040
Spec:
 Devices:
 Requests:
 Allocation Mode: ExactCount
 Count: 1
 Device Class Name: gpu.nvidia.com
 Name: single-gpu
Status:
 Allocation:
 Devices:
 Results:
 Admin Access: <nil>
 Device: gpu-0
 Driver: gpu.nvidia.com
 Pool: gke-cluster-gpu-pool-11026a2e-zgt1
 Request: single-gpu
 Node Selector:
 # lines omitted for clarity
 Reserved For:
 Name: dra-gpu-example-64b75dc6b-x8bd6
 Resource: pods
 UID: cb3cb1db-e62a-4961-9967-cdc7d599105b
Events: <none>

To get logs for the workload that you deployed, run the following command:
```
kubectllogsdeployment/dra-gpu-example--all-pods=true|grep"GPU"
```
The output should resemble the following:
```
[pod/dra-gpu-example-64b75dc6b-x8bd6/ctr]GPU0:TeslaT4(UUID:GPU-2087ac7a-f781-8cd7-eb6b-b00943cc13ef)
```
The output of these steps shows that GKE allocated one GPU to the Pod.

TPU

Get the ResourceClaim associated with the workload that you deployed:

kubectlgetresourceclaims|grepdra-tpu-example

The output should resemble the following:

NAMESTATEAGE
dra-tpu-example-64b75dc6b-x8bd6-all-tpus-jwwdhallocated,reserved9s

To get more details about the hardware assigned to the Pod, run the following command:

kubectldescriberesourceclaimsRESOURCECLAIM-oyaml

Replace RESOURCECLAIM with the full name of the ResourceClaim that you got from the output of the previous step.

The output should resemble the following:

apiVersion:resource.k8s.io/v1beta1
kind:ResourceClaim
metadata:
annotations:
resource.kubernetes.io/pod-claim-name:all-tpus
creationTimestamp:"2025-03-04T21:00:54Z"
finalizers:
-resource.kubernetes.io/delete-protection
generateName:dra-tpu-example-59b8785697-k9kzd-all-gpus-
name:dra-tpu-example-59b8785697-k9kzd-all-gpus-gnr7z
namespace:default
ownerReferences:
-apiVersion:v1
blockOwnerDeletion:true
controller:true
kind:Pod
name:dra-tpu-example-59b8785697-k9kzd
uid:c2f4fe66-9a73-4bd3-a574-4c3eea5fda3f
resourceVersion:"12189603"
uid:279b5014-340b-4ef6-9dda-9fbf183fbb71
spec:
devices:
requests:
-allocationMode:All
deviceClassName:tpu.google.com
name:all-tpus
status:
allocation:
devices:
results:
-adminAccess:null
device:"0"
driver:tpu.google.com
pool:gke-tpu-2ec29193-bcc0
request:all-tpus
-adminAccess:null
device:"1"
driver:tpu.google.com
pool:gke-tpu-2ec29193-bcc0
request:all-tpus
-adminAccess:null
device:"2"
driver:tpu.google.com
pool:gke-tpu-2ec29193-bcc0
request:all-tpus
-adminAccess:null
device:"3"
driver:tpu.google.com
pool:gke-tpu-2ec29193-bcc0
request:all-tpus
-adminAccess:null
device:"4"
driver:tpu.google.com
pool:gke-tpu-2ec29193-bcc0
request:all-tpus
-adminAccess:null
device:"5"
driver:tpu.google.com
pool:gke-tpu-2ec29193-bcc0
request:all-tpus
-adminAccess:null
device:"6"
driver:tpu.google.com
pool:gke-tpu-2ec29193-bcc0
request:all-tpus
-adminAccess:null
device:"7"
driver:tpu.google.com
pool:gke-tpu-2ec29193-bcc0
request:all-tpus
nodeSelector:
nodeSelectorTerms:
-matchFields:
-key:metadata.name
operator:In
values:
-gke-tpu-2ec29193-bcc0
reservedFor:
-name:dra-tpu-example-59b8785697-k9kzd
resource:pods
uid:c2f4fe66-9a73-4bd3-a574-4c3eea5fda3f

To get logs for the workload that you deployed, run the following command:

kubectllogsdeployment/dra-tpu-example--all-pods=true|grep"TPU"

The output should resemble the following:

[pod/dra-tpu-example-59b8785697-tm2lc/ctr]TPU_CHIPS_PER_HOST_BOUNDS=2,4,1
[pod/dra-tpu-example-59b8785697-tm2lc/ctr]TPU_TOPOLOGY_WRAP=false,false,false
[pod/dra-tpu-example-59b8785697-tm2lc/ctr]TPU_SKIP_MDS_QUERY=true
[pod/dra-tpu-example-59b8785697-tm2lc/ctr]TPU_RUNTIME_METRICS_PORTS=8431,8432,8433,8434,8435,8436,8437,8438
[pod/dra-tpu-example-59b8785697-tm2lc/ctr]TPU_WORKER_ID=0
[pod/dra-tpu-example-59b8785697-tm2lc/ctr]TPU_WORKER_HOSTNAMES=localhost
[pod/dra-tpu-example-59b8785697-tm2lc/ctr]TPU_TOPOLOGY=2x4
[pod/dra-tpu-example-59b8785697-tm2lc/ctr]TPU_ACCELERATOR_TYPE=v6e-8
[pod/dra-tpu-example-59b8785697-tm2lc/ctr]TPU_HOST_BOUNDS=1,1,1
[pod/dra-tpu-example-59b8785697-tm2lc/ctr]TPU_TOPOLOGY_ALT=false
[pod/dra-tpu-example-59b8785697-tm2lc/ctr]TPU_DEVICE_0_RESOURCE_CLAIM=77e68f15-fa2f-4109-9a14-6c91da1a38d3

The output of these steps indicates that all of the TPUs in a node pool were allocated to the Pod.

What's next

Explore more resources for AI/ML orchestration on GKE

Dynamically allocate devices to workloads with DRA Stay organized with collections Save and categorize content based on your preferences.

About dynamic resource allocation

About requesting devices with DRA

Limitations

Requirements

Before you begin

Use DRA to deploy workloads

GPU

TPU

Verify the hardware allocation

GPU

TPU

What's next

Dynamically allocate devices to workloads with DRA