Skip to content

Navigation Menu

Sign in
Appearance settings

Search code, repositories, users, issues, pull requests...

Provide feedback

We read every piece of feedback, and take your input very seriously.

Saved searches

Use saved searches to filter your results more quickly

Sign up
Appearance settings

jolares/example-gcp-dataform

Repository files navigation

stackoverflow-data-reports

  • data-reports-etl: A project built with BigQuery Dataform that transforms Stackoverflow public raw data into reporting tables in a BigQuery data warehouse.

  • ETL: A project built with BigQuery Dataform that transforms Stackoverflow public raw data into reporting tables in a BigQuery data warehouse.

Local Development Environment Setup

  • Clone the project workspace repository: git clone https://github.com/jolares/stackoverflow-ai.git

  • Install project workspace node dependencies: npm install

(Optional) Install Recommended VSCode Extensions

  • dataform.dataform: provides syntax highlighting, compilation, and intellisense for Dataform and SQLX projects. Refer to the extension site

GCP Project Configuration

Setup Secrets

Create a Secret Token with Dataform Service Account Permissions

A secret token is created for the GCP Dataform Service Account to interact with Dataform resources.

  • Enable GCP Secret Manager API for the GCP Project

  • Open the GCP Secret Manager console and create a new secret token with any secure value of your preference.

    • This project named the secret token GCP_BIGQUERY_DATAFORM_SA_TOKEN
  • After the secret is created, edit the secret's permissions and grant access to the Dataform service account; for this, make the service account a new principal for the secret, and assign to it the role Secret Manager Secret Accessor

A secret token is created for the GCP Dataform Service Account to interact with Dataform resources.

Assign BigQuery Permissions to the Dataform Service Account

  • Open the Google Cloud AIM Admin console

  • Assign the role of BigQuery Admin to the dataform service account

  • Edit the Dataform Service Account created by Google by adding the Role BigQuery Admin to it.

    Note: if you do not see the service account in the list of principals displayed to you within the Permissions page, you probably need to enable/check the option that indicates Include Google-provided role grants


Project Folder Structure

stackoverflow-ai/
├── ...
├── definitions/
├ ├── reporting/
├ ├── sources/
├ ├── testing/
├── environments.json
├── schedules.json
└── package.json

Dataform Dependencies

Dataform Configuration File

// dataform.json file
{
 // (Required) Set this value to the GCP BigQuery dataset name (the Dataset ID without
 // the GCP Project ID subdomain)
 "defaultSchema": "{GCP_BIGQUERY_DATASET_NAME}",
 // (Required)
 "assertionSchema": "dataform_assertions",
 // (Required)
 "warehouse": "bigquery",
 // (Required) Set this value to the GCP Project ID
 "defaultDatabase": "{GCP_PROJECT_ID}"
 // (Optional) Set this value the BigQuery Dataset Location (i.e. us-central-1, US)
 "defaultLocation": "US"
}

Dataform Environments File

{
 "environments": [
 {
 "name": "production",
 "configOverride": {},
 // The git repository branch, or commit SHA, that triggers the workflow
 // run using this environment (i.e. master, main, release, develop, etc)
 "gitRef": "master"
 },
 {
 "name": "development",
 "configOverride": {},
 "gitRef": "master"
 },
 {
 "name": "testing",
 "configOverride": {},
 "gitRef": "master"
 },
 // ... Other environments can be added here
 ]
}

Dataform Schedules File

{
 "schedules": [
 {
 "name": "daily",
 "options": {
 "includeDependencies": false,
 "fullRefresh": false,
 "tags": [
 "daily"
 ]
 },
 "cron": "00 09 * * *",
 "notification": {
 "onSuccess": false,
 "onFailure": false
 },
 "notifications": [
 {
 "events": [
 "failure"
 ],
 "channels": [
 "email jo"
 ]
 }
 ]
 }
 ],
 "notificationChannels": [
 {
 "name": "email jo",
 "email": {
 "to": [
 "jolares@gatech.edu"
 ]
 }
 }
 ]
}

About

Example end-to-end ELT data pipeline using GCP Dataform.

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

AltStyle によって変換されたページ (->オリジナル) /