Challenge Lab : Create ML Models with BigQuery ML

Create ML Models with BigQuery ML: Challenge Lab




Checkout Video

create-ml-models-with-bigquery-ml-challenge-lab


Task 1: Create a dataset to store your machine learning models

bq mk austin


Task 2: Create a forecasting BigQuery machine learning model.

Write the below query in BigQuery  console:

CREATE OR REPLACE MODEL austin.location_model

OPTIONS

  (model_type='linear_reg', labels=['duration_minutes']) AS

SELECT

    start_station_name,

    EXTRACT(HOUR FROM start_time) AS start_hour,

    EXTRACT(DAYOFWEEK FROM start_time) AS day_of_week,

    duration_minutes,

FROM

    `bigquery-public-data.austin_bikeshare.bikeshare_trips` AS trips

JOIN

    `bigquery-public-data.austin_bikeshare.bikeshare_stations` AS stations

ON

    trips.start_station_name = stations.name

WHERE

    EXTRACT(YEAR FROM start_time) = 2018

    AND duration_minutes > 0



OR Write the below command in cloud shell :

export PROJECT_ID=DEVSHELL_PROJECT_ID

bq query --use_legacy_sql=false "CREATE OR REPLACE MODEL austin.austin_1 OPTIONS(input_label_cols=['duration_minutes'], model_type='linear_reg') AS SELECT

duration_minutes, location, start_station_name, CAST(EXTRACT(dayofweek FROM start_time) AS STRING) as dayofweek, CAST(EXTRACT(hour FROM start_time) AS STRING) AS hourofday, FROM \`bigquery-public-data.austin_bikeshare.bikeshare_trips\` AS trips JOIN \`bigquery-public-data.austin_bikeshare.bikeshare_stations\` AS stations ON trips.start_station_id = stations.station_id WHERE EXTRACT(year from start_time) = 2018"


Task 3: Create the second machine learning model.

Write the below query in BigQuery  console:

CREATE OR REPLACE MODEL austin.subscriber_model

OPTIONS

  (model_type='linear_reg', labels=['duration_minutes']) AS

SELECT

    start_station_name,

    EXTRACT(HOUR FROM start_time) AS start_hour,

    subscriber_type,

    duration_minutes

FROM `bigquery-public-data.austin_bikeshare.bikeshare_trips` AS trips

WHERE EXTRACT(YEAR FROM start_time) = 2018



OR

 Write the below command in cloud shell :

export PROJECT_ID=DEVSHELL_PROJECT_ID

bq query --use_legacy_sql=false "CREATE OR REPLACE MODEL austin.austin_2 OPTIONS(input_label_cols=['duration_minutes'], model_type='linear_reg') AS SELECT duration_minutes, subscriber_type, start_station_name, CAST(EXTRACT(hour FROM start_time) AS STRING) AS hourofday FROM \`bigquery-public-data.austin_bikeshare.bikeshare_trips\` AS trips WHERE EXTRACT(year from start_time) = 2018"


Task 4: Evaluate the two machine learning models.

Write the below query in BigQuery  console:


-- Evaluation metrics for location_model

SELECT

  SQRT(mean_squared_error) AS rmse,

  mean_absolute_error

FROM

  ML.EVALUATE(MODEL austin.location_model, (

  SELECT

    start_station_name,

    EXTRACT(HOUR FROM start_time) AS start_hour,

    EXTRACT(DAYOFWEEK FROM start_time) AS day_of_week,

    duration_minutes

  FROM

    `bigquery-public-data.austin_bikeshare.bikeshare_trips` AS trips

  JOIN

   `bigquery-public-data.austin_bikeshare.bikeshare_stations` AS stations

  ON

    trips.start_station_name = stations.name

  WHERE EXTRACT(YEAR FROM start_time) = 2019)

);

-- Evaluation metrics for subscriber_model

SELECT

  SQRT(mean_squared_error) AS rmse,

  mean_absolute_error

FROM

  ML.EVALUATE(MODEL austin.subscriber_model, (

  SELECT

    start_station_name,

    EXTRACT(HOUR FROM start_time) AS start_hour,

    subscriber_type,

    duration_minutes

  FROM

    `bigquery-public-data.austin_bikeshare.bikeshare_trips` AS trips

  WHERE

    EXTRACT(YEAR FROM start_time) = 2019)

)


OR

 Write the below command in cloud shell :

bq query --use_legacy_sql=false "SELECT SQRT(mean_squared_error)AS rmse, mean_absolute_error FROM ML.EVALUATE(MODEL austin.austin_1, (SELECT duration_minutes, location, start_station_name, CAST(EXTRACT(dayofweek FROM start_time) AS STRING) as dayofweek, CAST(EXTRACT(hour FROM start_time) AS STRING) AS hourofday, FROM \`bigquery-public-data.austin_bikeshare.bikeshare_trips\` AS trips JOIN \`bigquery-public-data.austin_bikeshare.bikeshare_stations\` AS stations ON trips.start_station_id = stations.station_id WHERE EXTRACT(year from start_time) = 2019))"

bq query --use_legacy_sql=false "SELECT SQRT(mean_squared_error)AS rmse, mean_absolute_error FROM ML.EVALUATE(MODEL austin.austin_2, (SELECT duration_minutes, subscriber_type, start_station_name, CAST(EXTRACT(hour FROM start_time) AS STRING) AS hourofday FROM \`bigquery-public-data.austin_bikeshare.bikeshare_trips\` AS trips WHERE EXTRACT(year from start_time) = 2019))"



Task 5: Use the subscriber type machine learning model to predict average trip durations

Write the below query in BigQuery  console:


SELECT

  start_station_name,

  COUNT(*) AS trips

FROM

  `bigquery-public-data.austin_bikeshare.bikeshare_trips`

WHERE

  EXTRACT(YEAR FROM start_time) = 2019

GROUP BY

  start_station_name

ORDER BY

  trips DESC



OR

 Write the below command in cloud shell :

bq query --use_legacy_sql=false "SELECT start_station_name, avg(duration_minutes) as avg_duration, count(*) as total_trips FROM \`bigquery-public-data.austin_bikeshare.bikeshare_trips\` WHERE EXTRACT(year FROM start_time) = 2019 GROUP BY start_station_name ORDER BY total_trips DESC"


Now again in BigQuery Editor write this command :

SELECT AVG(predicted_duration_minutes) AS average_predicted_trip_length

FROM ML.predict(MODEL austin.subscriber_model, (

SELECT

    start_station_name,

    EXTRACT(HOUR FROM start_time) AS start_hour,

    subscriber_type,

    duration_minutes

FROM

  `bigquery-public-data.austin_bikeshare.bikeshare_trips`

WHERE

  EXTRACT(YEAR FROM start_time) = 2019

  AND subscriber_type = 'Single Trip'

  AND start_station_name = '21st & Speedway @PCL'))



OR in Cloud SHell write this command :

bq query --use_legacy_sql=false "SELECT avg(predicted_duration_minutes),count(duration_minutes) FROM Ml.PREDICT(MODEL austin.austin_2, (SELECT duration_minutes, subscriber_type, start_station_name, CAST(EXTRACT(hour FROM start_time) AS STRING) AS hourofday FROM \`bigquery-public-data.austin_bikeshare.bikeshare_trips\` AS trips WHERE EXTRACT(year from start_time)=2019 and start_station_name = '21st & Speedway @PCL' and subscriber_type='Single Trip' ))"



Hence we solved all the tasks Congrats 🥳

Consider Subscribing : https://bit.ly/3cI0zb4

Keep Learning ✌

#Tech_ED