Back up a Dataproc Metastore service
Stay organized with collections
Save and categorize content based on your preferences.
This page explains how to create a backup of a Dataproc Metastore
service.
A backup takes a snapshot of your
service saves its current configuration settings and all stored metadata.
After you create a backup, you can use the Restore from a backup feature to
populate a new Dataproc Metastore service with the data saved
in the snapshot.
To grant read and modify access to specific metadata of databases and tables:
Dataproc Metastore Metadata Operator (roles/metastore.metadataOperator)
on the metadata service
To use the Cloud Storage object that stores scheduled backups:
Cloud Storage Object User (roles/storage.objectUser)
on the Dataproc Metastore service agent
These predefined roles contain
the permissions required to back up a Dataproc Metastore service. To see the exact permissions that are
required, expand the Required permissions section:
Required permissions
The following permissions are required to back up a Dataproc Metastore service:
To back up a metadata service:
metastore.backups.create
Before running a backup operation, note the following considerations:
For each Dataproc Metastore service, you can create and store
up to seven backups at a time. If you try to exceed seven backups, the backup
process fails. If you want to create another backup, you must first manually
delete one of your stored backup files.
While a backup operation is running, you can't update your
Dataproc Metastore service — for example, you can't change
configuration settings. However, you can still use your service for normal
operations, such accessing metadata from attached Dataproc or
self-managed clusters.
You can create scheduled backups that run at various cron intervals,
such as every day.
Create a backup
To back up a Dataproc Metastore service, complete the steps in
one of the following tabs:
Console
In the Google Cloud console, open the Dataproc Metastore page:
Find the backup you want to delete and click the settings button.
Click Delete.
Schedule a backup
Backups can be scheduled to run at user-specified cron job
intervals, including running
daily, weekly, or monthly. A cron schedule uses the unix-cron string format
(* * * * *) which is a set of five fields in a line, indicating when the job
should be executed.
For example, you can set a custom interval to create a backup every week,
such as creating a backup every Wednesday at 2:00 PM PST.
Scheduled backup considerations
Scheduled backups need to specify a backup location, which must be a
Cloud Storage path.
Scheduled backups are always created in the Avro file format.
Scheduled backups are configured in the UTC timezone by default. You can
change the timezone when creating the backup for the first time.
Scheduled backups can be set to run at hourly, daily, weekly, or monthly
intervals. The minimum hourly interval you can set is 4 hours.
Create a scheduled backup
Backups schedules can be set when you create your service for the first time
or added later when you update your service.
To create a Dataproc Metastore service 2 with a scheduled backup,
complete the steps in one of the following tabs:
SERVICE: the ID or fully qualified identifier
for the backup.
LOCATION: the Google Cloud region in which
yourDataproc Metastore service resides.
SCHEDULED_BACKUP_CRON: the frequency of your
backup, specified in the cron time format.
For example, a cron value of 0 0 * * * schedules a daily
backup.
SCHEDULED_BACKUP_LOCATION: the
Cloud Storage location of your backup.
For example: gs://my-bucket/path/to/location.
or
You can also schedule a backup by storing the preceding values in a
configuration file:
gcloud metastore services create SERVICE \
--location=LOCATION \
--scheduled-backup-configs-from-file=SCHEDULED_BACKUP_CONFIGS_FROM_FILE
Replace the following:
SCHEDULED_BACKUP_CONFIGS_FROM_FILE: a path to
a JSON file containing the backup configuration values enabled,
cront_schedule, time_zone, and backup_location.
The following example shows a backup configuration file that
enables scheduled backups, sets the backup schedule to
every hour, specifies the time zone as PST, and defines the backup
location as a Cloud Storage bucket. You can choose time zones from
the list of common tz database time zones.
SERVICE: the ID or fully qualified identifier
for the scheduled backup.
LOCATION: the Google Cloud region in which
your Dataproc Metastore service resides.
SCHEDULED_BACKUP_CRON: the frequency of your
backup, specified in the cron time format.
For example, a cron value of 0 0 * * * schedules a daily
backup.
SCHEDULED_BACKUP_LOCATION: the Cloud Storage
location of your scheduled backup.
For example: gs://my-bucket/path/to/location.
You can also update a scheduled backup using the preceding values stored
in a configuration file:
gcloud metastore services update SERVICE \
--location=LOCATION \
--scheduled-backup-configs-from-file=SCHEDULED_BACKUP_CONFIGS_FROM_FILE
Replace the following:
SCHEDULED_BACKUP_CONFIGS_FROM_FILE: a path to
a JSON file containing the backup configuration.
The following example shows a backup config file that disables a
scheduled backup.
The Backup page opens and displays your scheduled backups. Note that
the backups are actually stored in the Cloud Storage bucket that
you provided in the scheduled backup configuration.
gcloud CLI
Run the following gcloud storage ls command:
gcloud storage ls gs://BUCKET_NAME/SERVICE/LOCATION
Replace the following:
BUCKET_NAME: the path to the Cloud Storage
bucket that stores the scheduled backup that you want to view.
SERVICE: the ID or fully qualified identifier
for the scheduled backup.
LOCATION: the Google Cloud region in which your
Dataproc Metastore service resides.
[[["Easy to understand","easyToUnderstand","thumb-up"],["Solved my problem","solvedMyProblem","thumb-up"],["Other","otherUp","thumb-up"]],[["Hard to understand","hardToUnderstand","thumb-down"],["Incorrect information or sample code","incorrectInformationOrSampleCode","thumb-down"],["Missing the information/samples I need","missingTheInformationSamplesINeed","thumb-down"],["Other","otherDown","thumb-down"]],["Last updated 2025年10月24日 UTC."],[],[]]