Dynamically allocate devices to workloads with DRA

This page explains how to deploy dynamic resource allocation (DRA) workloads on your Google Kubernetes Engine clusters. You create a ResourceClaimTemplate to request hardware with DRA and then deploy a basic workload to demonstrate how Kubernetes flexibly allocates hardware on your Pods.

This page is intended for Application operators and Data engineers who run workloads like AI/ML or high performance computing (HPC).

About dynamic resource allocation

DRA is a built-in Kubernetes feature that lets you flexibly request, allocate, and share hardware in your cluster among Pods and containers. For more information, see About dynamic resource allocation.

About requesting devices with DRA

When you set up your GKE infrastructure for DRA, the DRA drivers on your nodes create DeviceClass objects in the cluster. A DeviceClass defines a category of devices, such as GPUs, that are available to request for workloads. A platform administrator can optionally deploy additional DeviceClasses that limit which devices you can request in specific workloads.

To request devices within a DeviceClass, you create one of the following objects:

  • ResourceClaim: A ResourceClaim lets a Pod or a user request hardware resources by filtering for certain parameters within a DeviceClass.
  • ResourceClaimTemplate: A ResourceClaimTemplate defines a template that Pods can use to automatically create new per-Pod ResourceClaims.

For more information about ResourceClaim and ResourceClaimTemplate objects, see When to use ResourceClaims and ResourceClaimTemplates.

The examples on this page use a basic ResourceClaimTemplate to request the specified device configuration. For more detailed information, see the ResourceClaimTemplateSpec Kubernetes documentation.

Limitations

  • Node auto-provisioning isn't supported.
  • Autopilot clusters don't support DRA.
  • You can't use the following GPU sharing features:
    • Time-sharing GPUs
    • Multi-instance GPUs
    • Multi-process Service (MPS)

Requirements

To use DRA, your GKE version must be version 1.32.1-gke.1489001 or later.

You should also be familiar with the following requirements and limitations:

Before you begin

Before you start, make sure that you have performed the following tasks:

  • Enable the Google Kubernetes Engine API.
  • Enable Google Kubernetes Engine API
  • If you want to use the Google Cloud CLI for this task, install and then initialize the gcloud CLI. If you previously installed the gcloud CLI, get the latest version by running the gcloud components update command. Earlier gcloud CLI versions might not support running the commands in this document.

Use DRA to deploy workloads

To request per-Pod device allocation, you first create a ResourceClaimTemplate that produces a ResourceClaim to describe your request for GPUs or TPUs, which Kubernetes uses as a template to create new ResourceClaim objects for each Pod in a workload. When you specify the ResourceClaimTemplate in a workload, Kubernetes allocates the requested resources and schedules the Pods on corresponding nodes.

GPU

  1. Save the following manifest as claim-template.yaml:

    apiVersion:resource.k8s.io/v1beta2
    kind:ResourceClaimTemplate
    metadata:
    name:gpu-claim-template
    spec:
    spec:
    devices:
    requests:
    -name:single-gpu
    deviceClassName:gpu.nvidia.com
    allocationMode:ExactCount
    count:1
    
  2. Create the ResourceClaimTemplate:

    kubectlcreate-fclaim-template.yaml
    
  3. To create a workload that references the ResourceClaimTemplate, save the following manifest as dra-gpu-example.yaml:

    apiVersion:apps/v1
    kind:Deployment
    metadata:
    name:dra-gpu-example
    spec:
    replicas:1
    selector:
    matchLabels:
    app:dra-gpu-example
    template:
    metadata:
    labels:
    app:dra-gpu-example
    spec:
    containers:
    -name:ctr
    image:ubuntu:22.04
    command:["bash","-c"]
    args:["while[1];dodate;echo$(nvidia-smi-L||echoWaiting...);sleep60;done"]
    resources:
    claims:
    -name:single-gpu
    resourceClaims:
    -name:single-gpu
    resourceClaimTemplateName:gpu-claim-template
    tolerations:
    -key:"nvidia.com/gpu"
    operator:"Exists"
    effect:"NoSchedule"
    
  4. Deploy the workload:

    kubectlcreate-fdra-gpu-example.yaml
    

TPU

  1. Save the following manifest as claim-template.yaml:

    apiVersion:resource.k8s.io/v1beta2
    kind:ResourceClaimTemplate
    metadata:
    name:tpu-claim-template
    spec:
    spec:
    devices:
    requests:
    -name:all-tpus
    deviceClassName:tpu.google.com
    allocationMode:All
    

    This ResourceClaimTemplate requests that GKE allocate an entire TPU node pool to every ResourceClaim.

  2. Create the ResourceClaimTemplate:

    kubectlcreate-fclaim-template.yaml
    
  3. To create a workload that references the ResourceClaimTemplate, save the following manifest as dra-tpu-example.yaml:

    apiVersion:apps/v1
    kind:Deployment
    metadata:
    name:dra-tpu-example
    spec:
    replicas:1
    selector:
    matchLabels:
    app:dra-tpu-example
    template:
    metadata:
    labels:
    app:dra-tpu-example
    spec:
    containers:
    -name:ctr
    image:ubuntu:22.04
    command:
    -/bin/sh
    --c
    -|
    echo "Environment Variables:"
    env
    echo "Sleeping indefinitely..."
    sleep infinity
    resources:
    claims:
    -name:all-tpus
    resourceClaims:
    -name:all-tpus
    resourceClaimTemplateName:tpu-claim-template
    tolerations:
    -key:"google.com/tpu"
    operator:"Exists"
    effect:"NoSchedule"
    
  4. Deploy the workload:

    kubectlcreate-fdra-tpu-example.yaml
    

Verify the hardware allocation

You can verify that your workloads have been allocated hardware by checking the ResourceClaim or by looking at the logs for your Pod.

GPU

  1. Get the ResourceClaim associated with the workload that you deployed:

    kubectlgetresourceclaims
    

    The output should resemble the following:

    NAMESTATEAGE
    dra-gpu-example-64b75dc6b-x8bd6-single-gpu-jwwdhallocated,reserved9s
    
  2. To get more details about the hardware assigned to the Pod, run the following command:

    kubectldescriberesourceclaimsRESOURCECLAIM
    

    Replace RESOURCECLAIM with the full name of the ResourceClaim that you got from the output of the previous step.

    The output should resemble the following:

    Name: dra-gpu-example-64b75dc6b-x8bd6-single-gpu-jwwdh
    Namespace: default
    Labels: <none>
    Annotations: resource.kubernetes.io/pod-claim-name: single-gpu
    API Version: resource.k8s.io/v1beta1
    Kind: ResourceClaim
    Metadata:
     Creation Timestamp: 2025年03月31日T17:11:37Z
     Finalizers:
     resource.kubernetes.io/delete-protection
     Generate Name: dra-gpu-example-64b75dc6b-x8bd6-single-gpu-
     Owner References:
     API Version: v1
     Block Owner Deletion: true
     Controller: true
     Kind: Pod
     Name: dra-gpu-example-64b75dc6b-x8bd6
     UID: cb3cb1db-e62a-4961-9967-cdc7d599105b
     Resource Version: 12953269
     UID: 3e0c3925-e15a-40e9-b552-d03610fff040
    Spec:
     Devices:
     Requests:
     Allocation Mode: ExactCount
     Count: 1
     Device Class Name: gpu.nvidia.com
     Name: single-gpu
    Status:
     Allocation:
     Devices:
     Results:
     Admin Access: <nil>
     Device: gpu-0
     Driver: gpu.nvidia.com
     Pool: gke-cluster-gpu-pool-11026a2e-zgt1
     Request: single-gpu
     Node Selector:
     # lines omitted for clarity
     Reserved For:
     Name: dra-gpu-example-64b75dc6b-x8bd6
     Resource: pods
     UID: cb3cb1db-e62a-4961-9967-cdc7d599105b
    Events: <none>
    
  3. To get logs for the workload that you deployed, run the following command:

    kubectllogsdeployment/dra-gpu-example--all-pods=true|grep"GPU"
    

    The output should resemble the following:

    [pod/dra-gpu-example-64b75dc6b-x8bd6/ctr]GPU0:TeslaT4(UUID:GPU-2087ac7a-f781-8cd7-eb6b-b00943cc13ef)
    

    The output of these steps shows that GKE allocated one GPU to the Pod.

TPU

  1. Get the ResourceClaim associated with the workload that you deployed:

    kubectlgetresourceclaims|grepdra-tpu-example
    

    The output should resemble the following:

    NAMESTATEAGE
    dra-tpu-example-64b75dc6b-x8bd6-all-tpus-jwwdhallocated,reserved9s
    
  2. To get more details about the hardware assigned to the Pod, run the following command:

    kubectldescriberesourceclaimsRESOURCECLAIM-oyaml
    

    Replace RESOURCECLAIM with the full name of the ResourceClaim that you got from the output of the previous step.

    The output should resemble the following:

    apiVersion:resource.k8s.io/v1beta1
    kind:ResourceClaim
    metadata:
    annotations:
    resource.kubernetes.io/pod-claim-name:all-tpus
    creationTimestamp:"2025-03-04T21:00:54Z"
    finalizers:
    -resource.kubernetes.io/delete-protection
    generateName:dra-tpu-example-59b8785697-k9kzd-all-gpus-
    name:dra-tpu-example-59b8785697-k9kzd-all-gpus-gnr7z
    namespace:default
    ownerReferences:
    -apiVersion:v1
    blockOwnerDeletion:true
    controller:true
    kind:Pod
    name:dra-tpu-example-59b8785697-k9kzd
    uid:c2f4fe66-9a73-4bd3-a574-4c3eea5fda3f
    resourceVersion:"12189603"
    uid:279b5014-340b-4ef6-9dda-9fbf183fbb71
    spec:
    devices:
    requests:
    -allocationMode:All
    deviceClassName:tpu.google.com
    name:all-tpus
    status:
    allocation:
    devices:
    results:
    -adminAccess:null
    device:"0"
    driver:tpu.google.com
    pool:gke-tpu-2ec29193-bcc0
    request:all-tpus
    -adminAccess:null
    device:"1"
    driver:tpu.google.com
    pool:gke-tpu-2ec29193-bcc0
    request:all-tpus
    -adminAccess:null
    device:"2"
    driver:tpu.google.com
    pool:gke-tpu-2ec29193-bcc0
    request:all-tpus
    -adminAccess:null
    device:"3"
    driver:tpu.google.com
    pool:gke-tpu-2ec29193-bcc0
    request:all-tpus
    -adminAccess:null
    device:"4"
    driver:tpu.google.com
    pool:gke-tpu-2ec29193-bcc0
    request:all-tpus
    -adminAccess:null
    device:"5"
    driver:tpu.google.com
    pool:gke-tpu-2ec29193-bcc0
    request:all-tpus
    -adminAccess:null
    device:"6"
    driver:tpu.google.com
    pool:gke-tpu-2ec29193-bcc0
    request:all-tpus
    -adminAccess:null
    device:"7"
    driver:tpu.google.com
    pool:gke-tpu-2ec29193-bcc0
    request:all-tpus
    nodeSelector:
    nodeSelectorTerms:
    -matchFields:
    -key:metadata.name
    operator:In
    values:
    -gke-tpu-2ec29193-bcc0
    reservedFor:
    -name:dra-tpu-example-59b8785697-k9kzd
    resource:pods
    uid:c2f4fe66-9a73-4bd3-a574-4c3eea5fda3f
    
  3. To get logs for the workload that you deployed, run the following command:

    kubectllogsdeployment/dra-tpu-example--all-pods=true|grep"TPU"
    

    The output should resemble the following:

    [pod/dra-tpu-example-59b8785697-tm2lc/ctr]TPU_CHIPS_PER_HOST_BOUNDS=2,4,1
    [pod/dra-tpu-example-59b8785697-tm2lc/ctr]TPU_TOPOLOGY_WRAP=false,false,false
    [pod/dra-tpu-example-59b8785697-tm2lc/ctr]TPU_SKIP_MDS_QUERY=true
    [pod/dra-tpu-example-59b8785697-tm2lc/ctr]TPU_RUNTIME_METRICS_PORTS=8431,8432,8433,8434,8435,8436,8437,8438
    [pod/dra-tpu-example-59b8785697-tm2lc/ctr]TPU_WORKER_ID=0
    [pod/dra-tpu-example-59b8785697-tm2lc/ctr]TPU_WORKER_HOSTNAMES=localhost
    [pod/dra-tpu-example-59b8785697-tm2lc/ctr]TPU_TOPOLOGY=2x4
    [pod/dra-tpu-example-59b8785697-tm2lc/ctr]TPU_ACCELERATOR_TYPE=v6e-8
    [pod/dra-tpu-example-59b8785697-tm2lc/ctr]TPU_HOST_BOUNDS=1,1,1
    [pod/dra-tpu-example-59b8785697-tm2lc/ctr]TPU_TOPOLOGY_ALT=false
    [pod/dra-tpu-example-59b8785697-tm2lc/ctr]TPU_DEVICE_0_RESOURCE_CLAIM=77e68f15-fa2f-4109-9a14-6c91da1a38d3
    

    The output of these steps indicates that all of the TPUs in a node pool were allocated to the Pod.

What's next

Except as otherwise noted, the content of this page is licensed under the Creative Commons Attribution 4.0 License, and code samples are licensed under the Apache 2.0 License. For details, see the Google Developers Site Policies. Java is a registered trademark of Oracle and/or its affiliates.

Last updated 2025年10月14日 UTC.