As part of Google Analytics 360, you can have all of your Google Analytics data exported daily into Google BigQuery, in its entirety, in the form of a new table per day. We often have to do something with that new data as it becomes available, such as exporting them into our customer’s data warehouse or updating machine learning models.
At Data Runs Deep, we love Google Cloud Functions, and use them extensively to trigger workloads and processes based on events. The problem with BigQuery, however, is that it doesn’t emit any events. We found a workaround by utilising Stackdriver alerts to publish a Pub/Sub event that our Cloud Function could then respond to.
Here I will outline the steps to set up a serverless pipeline to extract BigQuery GA360 tables to Google Cloud Storage, after which you can set up another function to do as you wish with them.
In brief, we create a Stackdriver log trigger to listen to a table upload event in BigQuery, which will publish a message to a Pub/Sub topic. This will in turn invoke a Cloud Function to extract this table to a cloud storage bucket in JSON format.
- Create a Stackdriver export to Pubsub
- Go to the Logging section in the hamburger menu
- Navigate to ‘exports’ in the left hand panel. Click ‘create export’
- Usually we call this sink export ‘ga_table_creation’ but you can name it anything
- Choose ‘Cloud Pub/Sub’ and below it choose the ‘Create new Pubsub topic’ option
- In the prompt, enter the topic name - again I usually use ‘ga_table_creation’
- To add the query filter, click on the downward arrow on the right edge of the text box that reads ‘Filter by label or text search’
- Select ‘convert to advanced filter’. Remove the text in the resulting text box and replace it with:
resource.type="bigquery_resource" protoPayload.methodName="jobservice.jobcompleted" protoPayload.serviceData.jobCompletedEvent.eventName="load_job_completed" protoPayload.authenticationInfo.principalEmail="email@example.com" protoPayload.serviceData.jobCompletedEvent.job.jobConfiguration.load.destinationTable.tableId:"ga_sessions_" NOT protoPayload.serviceData.jobCompletedEvent.job.jobConfiguration.load.destinationTable.tableId:"ga_sessions_intraday"
- Create a Google Cloud Storage bucket to receive the extract from BQ
- Navigate to the Storage section and create a bucket. Name it something like ‘ga360_extracts’
- Importantly, make sure it is in the same region as the BigQuery data you will be extracting
- If the BigQuery data is located in australia-southeast1, then choose a regional bucket and locate it in australia-southeast1
- Also, under the ‘Show advanced settings’ section, click on the ‘Set a retention policy’ checkbox and specify how long objects should be kept. After this time, they will be automatically deleted from the storage bucket.
- Write and deploy a Cloud Function to extract the table to Google Cloud Storage
The code for this function should look like the below. I have also included a requirements.txt below, please remember to include this as well before you deploy.
The requirements.txt file’s contents are simply:
This is a straightforward workflow to run a Cloud Function that exports a BigQuery table as soon as it’s ready. We can thank Stackdriver’s ability to export log events to a Pub/Sub topic to allow for this. It’s handy to know that we can have Cloud Functions respond to almost anything as long as that “anything” is logged in Stackdriver.