Stay organized with collections
Save and categorize content based on your preferences.
This page describes how to use a Cloud Key Management Service (Cloud KMS) encryption
key with Cloud Data Fusion.
By default, Cloud Data Fusion encrypts customer content at
rest. Cloud Data Fusion handles encryption for you without any
additional actions on your part. This option is called Google default encryption.
If you want to control your encryption keys, then you can use customer-managed encryption keys
(CMEKs) in Cloud KMS with CMEK-integrated services including
Cloud Data Fusion. Using Cloud KMS keys gives you control over their protection
level, location, rotation schedule, usage and access permissions, and cryptographic boundaries.
Using Cloud KMS also lets
you track key usage, view audit logs, and
control key lifecycles.
Instead of Google owning and managing the symmetric
key encryption keys (KEKs) that protect your data, you control and
manage these keys in Cloud KMS.
After you set up your resources with CMEKs, the experience of accessing your
Cloud Data Fusion resources is similar to using Google default encryption.
For more information about your encryption
options, see Customer-managed encryption keys (CMEK).
Cloud Data Fusion supports Cloud KMS key usage tracking for the Instance resource.
CMEK lets you control the data that's written to Google internal resources in tenant projects
and data written by Cloud Data Fusion pipelines, including the following:
Pipeline logs and metadata
Dataproc cluster metadata
Various Cloud Storage, BigQuery, Pub/Sub, and
Spanner data sinks, actions, and sources
Cloud Data Fusion supports CMEK for Dataproc clusters.
Cloud Data Fusion creates a temporary Dataproc cluster for
use in the pipeline, and then deletes the cluster when the pipeline completes.
CMEK protects the cluster metadata written to the following:
Persistent disks (PD) attached to cluster VMs
Job driver output and other metadata written to the auto-created or
user-created Dataproc staging bucket
Set up CMEK
Create a Cloud KMS key
Create a Cloud KMS key in the
Google Cloud project that contains the Cloud Data Fusion instance
or in a separate user project. The Cloud KMS key ring location
must match the region where you create the instance. A multi-region or
global region key isn't allowed at the instance level because
Cloud Data Fusion is always associated with a particular region.
Get the resource name for the key
REST API
Get the resource name of the key that you created with the following command:
Granting the Cloud KMS CryptoKey Encrypter/Decrypter role to the
Cloud Data Fusion service agent enables Cloud Data Fusion to
use CMEK to encrypt any customer data stored in tenant projects.
Granting the Cloud KMS CryptoKey Encrypter/Decrypter role to the
Compute Engine service agent enables Cloud Data Fusion to
use CMEK to encrypt persistent disk (PD) metadata written by the
Dataproc cluster running in your pipeline.
Granting this role to the Cloud Storage service agent enables
Cloud Data Fusion to use CMEK to encrypt the Cloud Storage
bucket that stores and caches pipeline information and data written to the
Dataproc cluster staging bucket and any other
Cloud Storage buckets in your project used by your pipeline.
Required: Grant the Cloud KMS CryptoKey Encrypter/Decrypter role
to the Google Cloud Dataproc Service Agent. This service agent is of the
form:
Encrypts data written to any bucket created by the plugin. If the bucket
already exists, this value is ignored.
BigQuery Execute
Encrypts data written to the dataset or table that the plugin creates to
store the query results. It's only applicable if you store
the query results in a BigQuery table.
Cloud Data Fusion sources
BigQuery source
Encrypts data written to any bucket created by the plugin. If the bucket
already exists, this value is ignored.
Cloud Data Fusion SQL engine
BigQuery Pushdown Engine
Encrypts data written to any bucket, dataset, or table created by the
plugin.
Use CMEK with Dataproc cluster metadata
The pre-created compute profiles use the CMEK key provided during instance
creation to encrypt the Persistent Disk (PD) and the
staging bucket
metadata written by the Dataproc cluster running in your pipeline. You
can modify to use another key by doing one of the following:
Recommended: Create a new Dataproc compute profile
(Enterprise edition only).
Edit an existing Dataproc compute profile (Developer, Basic,
or Enterprise editions).
Console
Open the Cloud Data Fusion instance:
In the Google Cloud console, go to the Cloud Data Fusion page.
To open the instance in the Cloud Data Fusion Studio,
click Instances, and then click View instance.
Enter a Profile label, Profile name, and Description.
By default, Dataproc creates staging and temp buckets whenever an ephemeral cluster is created by Cloud Data Fusion. Cloud Data Fusion supports passing the Dataproc staging bucket as an argument in the compute profile.
To encrypt the staging bucket, create a CMEK-enabled bucket and pass it as an argument to Dataproc in the compute profile.
By default, Cloud Data Fusion auto-creates a Cloud Storage bucket to stage dependencies used by Dataproc. If you prefer to use a Cloud Storage bucket that already exists in your project, follow these steps:
In the General Settings section, enter your existing
Cloud Storage bucket in the Cloud Storage Bucket
field.
Get the resource ID of your Cloud KMS key. In
the General Settings section, enter your resource ID in the
Encryption Key Name field.
Click Create.
If more than one profile is listed in the System Compute Profiles
section of the Configuration tab, make the new
Dataproc profile the default profile by holding the
pointer over the profile name field and clicking the star that appears.
Select default profile.
Use CMEK with other resources
The provided CMEK key is set to the system preference during
Cloud Data Fusion instance creation. It is used to encrypt data
written to newly created resources by pipeline sinks such as
Cloud Storage, BigQuery, Pub/Sub, or
Spanner sinks.
This key only applies to newly created resources. If the resource already exists
before pipeline execution, you should manually apply the CMEK key to those
existing resources.
You can change the CMEK key by doing one of the following:
Use a runtime argument.
Set a Cloud Data Fusion system preference.
Runtime argument
In the Cloud Data Fusion Pipeline Studio page,
click the drop-down arrow to the right of the Run button.
In the Name field, enter gcp.cmek.key.name.
In the Value field, enter your key's resource ID.
Select Data Fusion edition.
Click Save.
The runtime argument you set here applies only to runs of the current
pipeline.
Preference
In the Cloud Data Fusion UI, click SYSTEM ADMIN.
Click the Configuration tab.
Click the System Preferences drop-down.
Click Edit System Preferences.
In the Key field, enter gcp.cmek.key.name.
In the Value field, enter your key's resource ID.
Select Data Fusion edition.
[[["Easy to understand","easyToUnderstand","thumb-up"],["Solved my problem","solvedMyProblem","thumb-up"],["Other","otherUp","thumb-up"]],[["Hard to understand","hardToUnderstand","thumb-down"],["Incorrect information or sample code","incorrectInformationOrSampleCode","thumb-down"],["Missing the information/samples I need","missingTheInformationSamplesINeed","thumb-down"],["Other","otherDown","thumb-down"]],["Last updated 2025年10月16日 UTC."],[],[]]