Create a regression model with BigQuery DataFrames

Create a linear regression model on the body mass of penguins using the BigQuery DataFrames API.

Explore further

For detailed documentation that includes this code sample, see the following:

Code sample

Python

Before trying this sample, follow the Python setup instructions in the BigQuery quickstart using client libraries. For more information, see the BigQuery Python API reference documentation.

To authenticate to BigQuery, set up Application Default Credentials. For more information, see Set up authentication for client libraries.

frombigframes.ml.linear_modelimport LinearRegression
importbigframes.pandasasbpd
# Load data from BigQuery
query_or_table = "bigquery-public-data.ml_datasets.penguins"
bq_df = bpd.read_gbq(query_or_table)
# Filter down to the data to the Adelie Penguin species
adelie_data = bq_df[bq_df.species == "Adelie Penguin (Pygoscelis adeliae)"]
# Drop the species column
adelie_data = adelie_data.drop(columns=["species"])
# Drop rows with nulls to get training data
training_data = adelie_data.dropna()
# Specify your feature (or input) columns and the label (or output) column:
feature_columns = training_data[
 ["island", "culmen_length_mm", "culmen_depth_mm", "flipper_length_mm", "sex"]
]
label_columns = training_data[["body_mass_g"]]
test_data = adelie_data[adelie_data.body_mass_g.isnull()]
# Create the linear model
model = LinearRegression()
model.fit(feature_columns, label_columns)
# Score the model
score = model.score(feature_columns, label_columns)
# Predict using the model
result = model.predict(test_data)

What's next

To search and filter code samples for other Google Cloud products, see the Google Cloud sample browser.

Except as otherwise noted, the content of this page is licensed under the Creative Commons Attribution 4.0 License, and code samples are licensed under the Apache 2.0 License. For details, see the Google Developers Site Policies. Java is a registered trademark of Oracle and/or its affiliates.