In a previous article, I talked about the differences between dbt Core and dbt Cloud, and how dbt Core does not provide a way to schedule and orchestrate the data transformations. Thus, if dbt Cloud isn’t your choice for orchestrating data, you will need to find another tool to do it.
My preferred way to do it is through Google Cloud Functions as I am usually using BigQuery for my data warehouse, which is also on Google Cloud Platform (GCP).
Other options could be to deploy through a virtual machine (but that would probably be too much for a process that often takes only minutes) or through a container (but this requires more advanced knowledge of coding and infrastructure management).
Below you will find a guide to deploying dbt Core on a Google Cloud Function.
Pre-requisites
You need to have a dbt Core project repository
You need to have access to a GCP project: I usually set up the orchestration in the project that already hosts my data transformation
Modify the dbt Core repository
First, we need to restructure the repository by putting all the current dbt folders and files in a sub-directory. I called it “dbt_transform” but you can name it however you want.
Note that I have a profiles.yml file in my dbt_transform folder. In this profiles.yml file, I use the oauth method:
Add the main.py and requirements.txt
At the root of the directory, create a main.py file:
Then, add the requirements.txt to manage the dependencies:
Now our repository looks like this:
Set up the service account for dbt
You will need to set up a service account for dbt with the following roles:
BigQuery Data Viewer and BigQuery Job User on the projects/datasets that host the sources of your dbt project
BigQuery Data Editor and BigQuery User on the project where your data transformations take place
Cloud Function Invoker on the project where your Cloud Function is going to run
To create this service account, you can run the Cloud Shell script (shortcut G + S) below.
You will need to set up the following variables: TRANSFORM_PROJECT_ID, FUNCTION_PROJECT_ID and SOURCE_PROJECT_IDS.