Skip to main content
Data Runs Deep Menu

Main navigation

  • Home
  • Who We Are
    • Careers
    • What We're Up To
    • What We Believe In
    • Ethical Policy
  • What We Do
  • Google Marketing Platform
    • Analytics 360
    • Tag Manager 360
    • Salesforce and GA360
    • Google Analytics App+Web
    • Data Studio
    • Optimize
  • Google Cloud Platform
    • Google BigQuery
      • Google BigQuery Training
      • Google Analytics Integration
  • Consulting
    • Strategy
    • Implementation
      • Google Analytics Audit
      • Google Tag Manager Audit
      • Enhanced Ecommerce Implementation
      • Google Tag Manager
    • Analysis
      • Google Data Studio
        • Google Data Studio Case Study - Mona
      • UX Analysis
  • Training
    • Five Day Google Analytics Online Workshop
    • Advanced Google Analytics Training
    • Google Tag Manager Training
    • Google Analytics 360 Training
    • Advanced Analytics Bootcamp
  • Partnerships
    • Conductrics
  • Blog
  • The Flying Beagle
  • Merch
  • Contact

Cool things you can do using window functions in BigQuery

Calculating sub-totals using BigQuery
By: Corinne Brooker
Posted in
  • Google BigQuery
Article Date
September 11, 2020

Lately, I’ve found myself using a number of window functions in BigQuery. They are really handy functions but can be difficult to wrap your head around. In this post, I will take you through the basic principles of window functions and then in subsequent posts, I will share some examples of how you can utilise them with Google Analytics data. 

SQL window functions enable you to query either a subset or full set of rows within your data set and return a value on each row of the results. You can use them to calculate running totals, rolling averages or simply to remove duplicate rows from your data set. 

 

There are many types of functions that can be used with a window and they fall into three main categories:

  • Numbering: RANK, DENSE_RANK, ROW_NUMBER
  • Aggregating: SUM, AVG, COUNT, MIN, MAX
  • Navigational: LEAD, LAG, FIRST_VALUE, LAST_VALUE

 

A window function follows this general format though you won’t necessarily use each clause in every statement:  

Function_name (Expression) OVER (PARTITION BY Expression_list ORDER BY Order_list ASC Frame_clause) AS Column_name

1. Function_name: This is the analytics function of your choice e.g. SUM, RANK, LEAD

2. Expression: This is the column you are querying or can be a CASE WHEN logical statement. It is left blank for some numbering functions like RANK and ROW_NUMBER

3. OVER: This determines the window or the set of rows the function will operate within. If you leave it blank between the parentheses, it will query all rows in the data set. 

4. PARTITION BY: This clause is essentially like a group by. If you partition by fullvisitorid, the function will query all the rows for each user at a time. If you partition by fullvisitorid and visitstarttime, the function will query all the rows within each session for each user. 

5. ORDER BY: Sometimes you will need to order the rows in your partition. This is important for navigational and numbering functions so the query knows where to start and finish. You can order by multiple columns and order in ascending (ASC) or descending (DESC) order. 

6. Frame_clause: This section lets you choose a subset of rows within your partition. If you wanted to query a running total or a rolling average you specify here. 

Still with me? Here’s some simple examples to illustrate it in action. 

 

Example 1 - Find the total sales across the full catalogue 

If you want to query all rows in the data set, all you need to do is omit the PARTITION BY/ORDER BY and leave between the parentheses of the OVER clause as blank (). The example below is summing the sales for all products in the data set. 

SUM (Sales) OVER () AS TotalSales

 

Example 2 - Find the total sales within each category

A partition is like a GROUP BY function except this time you get the aggregated sum on each row of the data set. The statement below sums the sales for each category. 

SUM (Sales) OVER (PARTITION BY Category) AS CategorySales

 

Example 3 - Find the total sales within each subcategory 

You can partition by multiple columns with simply a comma between them and it will sum all the values within each partitioned column combination, in this case each category and subcategory combo.  

SUM (Sales) OVER (PARTITION BY Category, Subcategory) AS SubcategorySales

Now that you have the total / subtotals on each row, you can do calculations across the columns. For example, what percent of total sales came from each product. 

 

Example 4 - Find the top selling products in each category

Another window function you could use is RANK. This requires an ORDER BY clause. The example below will rank each product within each category from highest sales to lowest (ORDER BY Sales DESC). 

RANK () OVER (PARTITION BY Category ORDER BY Sales DESC)

Rank can be useful for limiting your results to the top X products. 

 

Example 5 - Find the running total

A frame clause follows this format: ROWS BETWEEN X AND Y (where X is the starting point and Y is the end point of the window). 

You can either designate a set number of rows (e.g. 5 PRECEDING) or you can use UNBOUNDED means unlimited number of rows. If you wanted to query the full set of rows you would say UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING which is simply all rows before and after the current row. 

Below is an example of a running total. In this example, I have used ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW, because I want to add all the sales before and up to the current row. On the 3rd August, the running total of $25,400 is the result of adding $9,000 + $15,000 (UNBOUNDED PRECEDING) and $1,400 (CURRENT ROW). 

The frame clause gives you flexibility depending on your use case. You could add the sales from the current row and the next row using ROWS BETWEEN CURRENT ROW AND 1 FOLLOWING. Alternatively, if you want to add 5 rows before and after the current row, then your statement will look like this ROWS BETWEEN 5 PRECEDING and 5 FOLLOWING.

SUM (Sales) OVER (ORDER BY Date ASC ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW) AS RunningTotal

 

Example 6 - Rolling Average 

Below is an example using a frame clause to calculate a 3 day rolling average of sales. The function changes to an AVG (instead of SUM) and the frame clause looks at ROWS BETWEEN 2 PRECEDING AND CURRENT ROW. 

Note: the first two rows are not really a 3 day average because there are not 2 preceding rows in the data set. Another thing to be mindful of is the number of rows in your partitions because this could have unexpected results if they are uneven. For example, if you had a date with no sales, there could be a missing row in the data set. When the statement looks back 2 rows, it will actually be looking back 3 dates resulting in a 4 day rolling average. 

AVG (Sales) OVER (ORDER BY Date ASC ROWS BETWEEN 2 PRECEDING AND CURRENT ROW) AS Rolling3DayAverage

 

So these are the basic principles of window functions. You can find the full list of the compatible functions that can have an OVER clause in BigQuery on these pages (Numbering, Aggregating, Navigational). 

Stay tuned for some specific examples using Google Analytics data in BigQuery that you can use to up your BigQuery game. 


Related Posts

  • Identify Potential Session Breakage Using BigQuery

    Article Date
    May 18, 2020
  • How to re-attribute your conversions to the correct traffic source in BigQuery

    Article Date
    December 04, 2020
  • Use our BigQuery Statement Builder To Better Understand The Behaviours Of Your Customers

    Article Date
    February 22, 2020
  • How To Calculate Table Level Percentages In BigQuery

    Article Date
    June 26, 2019
  • Remove Duplicate Transactions From Your Reporting in BigQuery

    Article Date
    November 20, 2020

Get in touch

To find out more, email us on hello@datarunsdeep.com.au or complete a quick contact form.

Get in touch

Talk to us: 1300 737 885

08:30 - 16:30, Monday to Friday. We love to chat

Visit us, we’re at:

Level 8, 313 Little Collins Street Melbourne, VIC 3000

Footer

  • Home
    • Who We Are
      • What We Do
      • What We're Up To
      • Careers
      • Contact
  • Training
    • Google Analytics Training Courses
    • Advanced Google Analytics Training
    • Google Tag Manager Training
  • Consulting
    • Analysis
    • Implementation
    • Strategy
    • Google Analytics Audit
    • Google Tag Manager Audit
  • Google Cloud Partner
  • Google Marketing Platform Sales Partner
  • Bitcoin accepted here
  • Registered Looker Consulting Partner Melbourne
  • Twitter
  • LinkedIn
  • Google Plus
  • Blog RSS feed
Copyright © 2009-2021 Data Runs Deep Pty Ltd. All Rights Reserved. Google Analytics is a trademark of Google Inc.
Website by Marin Sekesan