Deploy a Ray Serve application with a Stable Diffusion model on Google Kubernetes Engine (GKE)

This guide provides an example of how to deploy and serve a Stable Diffusion model on Google Kubernetes Engine (GKE) using Ray Serve and the Ray Operator add-on as an example implementation.

About Ray and Ray Serve

Ray is an open-source scalable compute framework for AI/ML applications. Ray Serve is a model serving library for Ray used for scaling and serving models in a distributed environment. For more information, see Ray Serve in the Ray documentation.

You can use a RayCluster or RayService resource to deploy your Ray Serve applications. You should use a RayService resource in production for the following reasons:

  • In-place updates for RayService applications
  • Zero downtime upgrading for RayCluster resources
  • Highly available Ray Serve applications

Objectives

This guide is intended for Generative AI customers, new or existing users of GKE, ML Engineers, MLOps (DevOps) engineers, or platform administrators who are interested in using Kubernetes container orchestration capabilities for serving models using Ray.

  • Create a GKE cluster with a GPU node pool.
  • Create a Ray cluster using the RayCluster custom resource.
  • Run a Ray Serve application.
  • Deploy a RayService custom resource.

Costs

In this document, you use the following billable components of Google Cloud:

To generate a cost estimate based on your projected usage, use the pricing calculator.

New Google Cloud users might be eligible for a free trial.

When you finish the tasks that are described in this document, you can avoid continued billing by deleting the resources that you created. For more information, see Clean up.

Before you begin

Cloud Shell is preinstalled with the software you need for this tutorial, including kubectl, and the gcloud CLI. If you don't use Cloud Shell, you must install the gcloud CLI.

  1. Sign in to your Google Cloud account. If you're new to Google Cloud, create an account to evaluate how our products perform in real-world scenarios. New customers also get 300ドル in free credits to run, test, and deploy workloads.
  2. Install the Google Cloud CLI.

  3. If you're using an external identity provider (IdP), you must first sign in to the gcloud CLI with your federated identity.

  4. To initialize the gcloud CLI, run the following command:

    gcloudinit
  5. Create or select a Google Cloud project.

    Roles required to select or create a project

    • Select a project: Selecting a project doesn't require a specific IAM role—you can select any project that you've been granted a role on.
    • Create a project: To create a project, you need the Project Creator (roles/resourcemanager.projectCreator), which contains the resourcemanager.projects.create permission. Learn how to grant roles.
    • Create a Google Cloud project:

      gcloud projects create PROJECT_ID

      Replace PROJECT_ID with a name for the Google Cloud project you are creating.

    • Select the Google Cloud project that you created:

      gcloud config set project PROJECT_ID

      Replace PROJECT_ID with your Google Cloud project name.

  6. Verify that billing is enabled for your Google Cloud project.

  7. Enable the GKE API:

    Roles required to enable APIs

    To enable APIs, you need the Service Usage Admin IAM role (roles/serviceusage.serviceUsageAdmin), which contains the serviceusage.services.enable permission. Learn how to grant roles.

    gcloudservicesenablecontainer.googleapis.com
  8. Install the Google Cloud CLI.

  9. If you're using an external identity provider (IdP), you must first sign in to the gcloud CLI with your federated identity.

  10. To initialize the gcloud CLI, run the following command:

    gcloudinit
  11. Create or select a Google Cloud project.

    Roles required to select or create a project

    • Select a project: Selecting a project doesn't require a specific IAM role—you can select any project that you've been granted a role on.
    • Create a project: To create a project, you need the Project Creator (roles/resourcemanager.projectCreator), which contains the resourcemanager.projects.create permission. Learn how to grant roles.
    • Create a Google Cloud project:

      gcloud projects create PROJECT_ID

      Replace PROJECT_ID with a name for the Google Cloud project you are creating.

    • Select the Google Cloud project that you created:

      gcloud config set project PROJECT_ID

      Replace PROJECT_ID with your Google Cloud project name.

  12. Verify that billing is enabled for your Google Cloud project.

  13. Enable the GKE API:

    Roles required to enable APIs

    To enable APIs, you need the Service Usage Admin IAM role (roles/serviceusage.serviceUsageAdmin), which contains the serviceusage.services.enable permission. Learn how to grant roles.

    gcloudservicesenablecontainer.googleapis.com
  14. Grant roles to your user account. Run the following command once for each of the following IAM roles: roles/container.clusterAdmin, roles/container.admin

    gcloudprojectsadd-iam-policy-bindingPROJECT_ID--member="user:USER_IDENTIFIER"--role=ROLE

    Replace the following:

    • PROJECT_ID: Your project ID.
    • USER_IDENTIFIER: The identifier for your user account. For example, myemail@example.com.
    • ROLE: The IAM role that you grant to your user account.

Prepare your environment

To prepare up your environment, follow these steps:

  1. Launch a Cloud Shell session from the Google Cloud console, by clicking Cloud Shell activation icon Activate Cloud Shell in the Google Cloud console. This launches a session in the bottom pane of the Google Cloud console.

  2. Set environment variables:

    export PROJECT_ID=PROJECT_ID
    export CLUSTER_NAME=rayserve-cluster
    export COMPUTE_REGION=us-central1
    export COMPUTE_ZONE=us-central1-c
    export CLUSTER_VERSION=CLUSTER_VERSION
    export TUTORIAL_HOME=`pwd`
    

    Replace the following:

    • PROJECT_ID: your Google Cloud project ID.
    • CLUSTER_VERSION: the GKE version to use. Must be 1.30.1 or later.
  3. Clone the GitHub repository:

    gitclonehttps://github.com/GoogleCloudPlatform/kubernetes-engine-samples
    
  4. Change to the working directory:

    cdkubernetes-engine-samples/ai-ml/gke-ray/rayserve/stable-diffusion
    
  5. Create a Python virtual environment:

    venv

    python-mvenvmyenv && \
    sourcemyenv/bin/activate
    

    Conda

    1. Install Conda.

    2. Run the following commands:

      condacreate-cconda-forgepython=3.9.19-nmyenv && \
      condaactivatemyenv
      

    When you deploy a Serve application with serve run, Ray expects the Python version of the local client to match the version used in the Ray cluster. The rayproject/ray:2.37.0 image uses Python 3.9. If you're running a different client version, select the appropriate Ray image.

  6. Install the required dependencies to run the Serve application:

    pipinstallray[serve]==2.37.0
    pipinstalltorch
    pipinstallrequests
    

Create a cluster with a GPU node pool

Create an Autopilot or Standard GKE cluster with a GPU node pool:

Autopilot

Create an Autopilot cluster:

gcloudcontainerclusterscreate-auto${CLUSTER_NAME}\
--enable-ray-operator\
--cluster-version=${CLUSTER_VERSION}\
--location=${COMPUTE_REGION}

Standard

  1. Create a Standard cluster:

    gcloudcontainerclusterscreate${CLUSTER_NAME}\
    --addons=RayOperator\
    --cluster-version=${CLUSTER_VERSION}\
    --machine-type=c3d-standard-8\
    --location=${COMPUTE_ZONE}\
    --num-nodes=1
    
  2. Create a GPU node pool:

    gcloudcontainernode-poolscreategpu-pool\
    --cluster=${CLUSTER_NAME}\
    --machine-type=g2-standard-8\
    --location=${COMPUTE_ZONE}\
    --num-nodes=1\
    --acceleratortype=nvidia-l4,count=1,gpu-driver-version=latest
    

Deploy a RayCluster resource

To deploy a RayCluster resource:

  1. Review the following manifest:

    apiVersion:ray.io/v1
    kind:RayCluster
    metadata:
    name:stable-diffusion-cluster
    spec:
    rayVersion:'2.37.0'
    headGroupSpec:
    rayStartParams:
    dashboard-host:'0.0.0.0'
    template:
    metadata:
    spec:
    containers:
    -name:ray-head
    image:rayproject/ray:2.37.0
    ports:
    -containerPort:6379
    name:gcs
    -containerPort:8265
    name:dashboard
    -containerPort:10001
    name:client
    -containerPort:8000
    name:serve
    resources:
    limits:
    cpu:"2"
    ephemeral-storage:"15Gi"
    memory:"8Gi"
    requests:
    cpu:"2"
    ephemeral-storage:"15Gi"
    memory:"8Gi"
    nodeSelector:
    cloud.google.com/machine-family:c3d
    workerGroupSpecs:
    -replicas:1
    minReplicas:1
    maxReplicas:4
    groupName:gpu-group
    rayStartParams:{}
    template:
    spec:
    containers:
    -name:ray-worker
    image:rayproject/ray:2.37.0-gpu
    resources:
    limits:
    cpu:4
    memory:"16Gi"
    nvidia.com/gpu:1
    requests:
    cpu:3
    memory:"16Gi"
    nvidia.com/gpu:1
    nodeSelector:
    cloud.google.com/gke-accelerator:nvidia-l4

    This manifest describes a RayCluster resource.

  2. Apply the manifest to your cluster:

    kubectlapply-fray-cluster.yaml
    
  3. Verify the RayCluster resource is ready:

    kubectlgetraycluster
    

    The output is similar to the following:

    NAME DESIRED WORKERS AVAILABLE WORKERS CPUS MEMORY GPUS STATUS AGE
    stable-diffusion-cluster 2 2 6 20Gi 0 ready 33s
    

    In this output, ready in the STATUS column indicates the RayCluster resource is ready.

Connect to the RayCluster resource

To connect to the RayCluster resource:

  1. Verify that GKE created the RayCluster service:

    kubectlgetsvcstable-diffusion-cluster-head-svc
    

    The output is similar to the following:

    NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE
    pytorch-mnist-cluster-head-svc ClusterIP 34.118.238.247 <none> 10001/TCP,8265/TCP,6379/TCP,8080/TCP 109s
    
  2. Establish port-forwarding sessions to the Ray head:

    kubectlport-forwardsvc/stable-diffusion-cluster-head-svc8265:82652>&1>/dev/null&
    kubectlport-forwardsvc/stable-diffusion-cluster-head-svc10001:100012>&1>/dev/null&
    
  3. Verify that the Ray client can connect to the Ray cluster using localhost:

    raylistnodes--addresshttp://localhost:8265
    

    The output is similar to the following:

    ======== List: 2024年06月19日 15:15:15.707336 ========
    Stats:
    ------------------------------
    Total: 3
    Table:
    ------------------------------
     NODE_ID NODE_IP IS_HEAD_NODE STATE NODE_NAME RESOURCES_TOTAL LABELS
    0 1d07447d7d124db641052a3443ed882f913510dbe866719ac36667d2 10.28.1.21 False ALIVE 10.28.1.21 CPU: 2.0 ray.io/node_id: 1d07447d7d124db641052a3443ed882f913510dbe866719ac36667d2
    # Several lines of output omitted
    

Run a Ray Serve application

To run a Ray Serve application:

  1. Run the Stable Diffusion Ray Serve application:

    serverunstable_diffusion:entrypoint--working-dir=.--runtime-env-json='{"pip": ["torch", "torchvision", "diffusers==0.12.1", "huggingface_hub==0.25.2", "transformers", "fastapi==0.113.0"], "excludes": ["myenv"]}'--addressray://localhost:10001
    

    The output is similar to the following:

    2024年06月19日 18:20:58,444 INFO scripts.py:499 -- Running import path: 'stable_diffusion:entrypoint'.
    2024年06月19日 18:20:59,730 INFO packaging.py:530 -- Creating a file package for local directory '.'.
    2024年06月19日 18:21:04,833 INFO handle.py:126 -- Created DeploymentHandle 'hyil6u9f' for Deployment(name='StableDiffusionV2', app='default').
    2024年06月19日 18:21:04,834 INFO handle.py:126 -- Created DeploymentHandle 'xo25rl4k' for Deployment(name='StableDiffusionV2', app='default').
    2024年06月19日 18:21:04,836 INFO handle.py:126 -- Created DeploymentHandle '57x9u4fp' for Deployment(name='APIIngress', app='default').
    2024年06月19日 18:21:04,836 INFO handle.py:126 -- Created DeploymentHandle 'xr6kt85t' for Deployment(name='StableDiffusionV2', app='default').
    2024年06月19日 18:21:04,836 INFO handle.py:126 -- Created DeploymentHandle 'g54qagbz' for Deployment(name='APIIngress', app='default').
    2024年06月19日 18:21:19,139 INFO handle.py:126 -- Created DeploymentHandle 'iwuz00mv' for Deployment(name='APIIngress', app='default').
    2024年06月19日 18:21:19,139 INFO api.py:583 -- Deployed app 'default' successfully.
    
  2. Establish a port-forwarding session to the Ray Serve port (8000):

    kubectlport-forwardsvc/stable-diffusion-cluster-head-svc8000:80002>&1>/dev/null&
    
  3. Run the Python script:

    pythongenerate_image.py
    

    The script generates an image to a file named output.png. The image is similar to the following:

    A beach at sunset. Image generated by Stable Diffusion.

Deploy a RayService

The RayService custom resource manages the lifecycle of a RayCluster resource and Ray Serve application.

For more information about RayService, see Deploy Ray Serve Applications and Production Guide in the Ray documentation.

To deploy a RayService resource, follow these steps:

  1. Review the following manifest:

    apiVersion:ray.io/v1
    kind:RayService
    metadata:
    name:stable-diffusion
    spec:
    serveConfigV2:|
    applications:
    - name: stable_diffusion
    import_path: ai-ml.gke-ray.rayserve.stable-diffusion.stable_diffusion:entrypoint
    runtime_env:
    working_dir: "https://github.com/GoogleCloudPlatform/kubernetes-engine-samples/archive/main.zip"
    pip: ["diffusers==0.12.1", "torch", "torchvision", "huggingface_hub==0.25.2", "transformers"]
    rayClusterConfig:
    rayVersion:'2.37.0'
    headGroupSpec:
    rayStartParams:
    dashboard-host:'0.0.0.0'
    template:
    spec:
    containers:
    -name:ray-head
    image:rayproject/ray:2.37.0
    ports:
    -containerPort:6379
    name:gcs
    -containerPort:8265
    name:dashboard
    -containerPort:10001
    name:client
    -containerPort:8000
    name:serve
    resources:
    limits:
    cpu:"2"
    ephemeral-storage:"15Gi"
    memory:"8Gi"
    requests:
    cpu:"2"
    ephemeral-storage:"15Gi"
    memory:"8Gi"
    nodeSelector:
    cloud.google.com/machine-family:c3d
    workerGroupSpecs:
    -replicas:1
    minReplicas:1
    maxReplicas:4
    groupName:gpu-group
    rayStartParams:{}
    template:
    spec:
    containers:
    -name:ray-worker
    image:rayproject/ray:2.37.0-gpu
    resources:
    limits:
    cpu:4
    memory:"16Gi"
    nvidia.com/gpu:1
    requests:
    cpu:3
    memory:"16Gi"
    nvidia.com/gpu:1
    nodeSelector:
    cloud.google.com/gke-accelerator:nvidia-l4

    This manifest describes a RayService custom resource.

  2. Apply the manifest to your cluster:

    kubectlapply-fray-service.yaml
    
  3. Verify that the Service is ready:

    kubectlgetsvcstable-diffusion-serve-svc
    

    The output is similar to the following:

    NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE
    stable-diffusion-serve-svc ClusterIP 34.118.236.0 <none> 8000/TCP 31m
    
  4. Configure port-forwarding to the Ray Serve Service:

    kubectlport-forwardsvc/stable-diffusion-serve-svc8000:80002>&1>/dev/null&
    
  5. Run the Python script from the previous section:

    python generate_image.py
    

    The script generates an image similar to the image generated in the previous section.

Clean up

Delete the project

    Delete a Google Cloud project:

    gcloud projects delete PROJECT_ID

Delete individual resources

To delete the cluster, type:

gcloudcontainerclustersdelete${CLUSTER_NAME}

What's next

  • Explore reference architectures, diagrams, and best practices about Google Cloud. Take a look at our Cloud Architecture Center.

Except as otherwise noted, the content of this page is licensed under the Creative Commons Attribution 4.0 License, and code samples are licensed under the Apache 2.0 License. For details, see the Google Developers Site Policies. Java is a registered trademark of Oracle and/or its affiliates.

Last updated 2025年11月06日 UTC.