Skip to main content
Data Runs Deep Menu

Main navigation

  • Home
  • Who We Are
    • Careers
    • What We're Up To
    • What We Believe In
    • Ethical Policy
  • What We Do
  • Google Marketing Platform
    • Analytics 360
    • Tag Manager 360
    • Salesforce and GA360
    • Google Analytics App+Web
    • Data Studio
    • Optimize
  • Google Cloud Platform
    • Google BigQuery
      • Google BigQuery Training
      • Google Analytics Integration
  • Consulting
    • Strategy
    • Implementation
      • Google Analytics Audit
      • Google Tag Manager Audit
      • Enhanced Ecommerce Implementation
      • Google Tag Manager
    • Analysis
      • Google Data Studio
        • Google Data Studio Case Study - Mona
      • UX Analysis
  • Training
    • Five Day Google Analytics Online Workshop
    • Advanced Google Analytics Training
    • Google Tag Manager Training
    • Google Analytics 360 Training
    • Advanced Analytics Bootcamp
  • Partnerships
    • Conductrics
  • Blog
  • The Flying Beagle
  • Merch
  • Contact

Using Google Cloud To Delete User Data From Google Analytics

Client_Id deletion in progress
By: Ahmad Hasnain
Posted in
  • Google Cloud Platform
  • Google Analytics
Article Date
February 18, 2020

In this post, I’m going to walk you through some ways of deleting user data from Google Analytics, via a couple of techniques that I found out by trial and error. I hope you find them useful, and if you’d like to chip in some ideas or comments, you can reach me via ahmad@datarunsdeep.com.au. 

 

Why might we need to do this?

There are a number of reasons why you might need to do this. You might have some historical bot traffic that you would like to remove (and by “some” that might mean at least a hundred thousand.)There can also be a scenario where some of the users’ PII(personally identifiable information) data was saved in the GA Event Labels during their sessions because of some… let’s say… “not so great” tracking setups. 

Also one of the guidelines of the EU’s General Data Protection Regulation (GDPR) which came in to effect on May 2018 is that upon request, businesses should be able to delete any data that they collect about people. This includes the website’s visitor data. Oops! got you. You can read the regulation summary on Amazee Metrics.  

There’s lots of reasons why you might need to delete user data and, thankfully, Google Analytics now has a User ID deletion feature available. 

Before we go any further, I’m going to assume that you have identified which users you need to delete, that you’ve made sure they’re the right ones to delete, and that you have a list of Client IDs in a csv or xml format.  Below are some of the simple or not so simple options you can opt for (depending on your technical capability or love for any technology). 

 

Solution: 1 – Deleting Manually

You go into the Google Analytics interface, you search for the User ID, and…. You can’t find it. After trying every possible regex or search expression, you realise that you may not know the exact day that the user was on the site. So, what can we do now? Either keep searching for a narrow date ranges or give an entire year and go for a coffee break. Depending on the size of data or your network speed, that might carry into a lunch break!

After a huge number of attempts, you finally find the Client ID and delete it. But can you do the same for hundred thousand Client IDs? Or even a thousand? 

There are many blogs available online to guide you step by step for the same, however that is not the key focus of this article.

GA Explorer Client Id deletion
Client_Id deletion via Google Analytics explorer

If you still need help to navigate this method, you can use this blog post from More Visibility. 

 

Solution: 2 – Automated Deletion via Google Sheets

If you’re anything like me, you’re not satisfied with the above long and painful process. Some searching around the internet might take you to another easier, if not faster way of doing it - via the script on a Google Sheets spreadsheet. 

You can use jingsblog to build your own simplified user deletion tool. It’s a great technique, but as you might expect it can’t be used for thousands of Client IDs. In fact, the spreadsheet times out after 30 mins and you may see the below message:

user_deletion_api timeout error
user deletion API timeout error

 

Solution: 3 – Automated Deletion Using R

This technique uses Mark Edmondson’s googleAnalyticsR package.

Now if you an R lover and work with Google Analytics, you would be familiar with this name. Mark has written an amazing library which comes to our rescue every now and then. So we tried to use the ga_clientid_deletion function for this.

Below is the code snippet:

#setup
library(googleAnalyticsR)
library(readr)
{
# make sure you are authenticated with user deletion scopes
options(googleAuthR.scopes.selected = "https://www.googleapis.com/auth/analytics.user.deletion")
ga_auth()

# read the file with list of client ids to be deleted

delete_list = read_csv("your_file.csv", col_names = FALSE)
# pass a vector of ids
ids = c(delete_list$X1[2:length(delete_list$X1)])
output = ga_clientid_deletion(ids, "UA-XXXXX-XX")
#recording the output
write.csv(output,'deleted_segment.csv')

This technique has an issue: It took an average of 30 mins to delete 1000 Client IDs, which is fine unless you have hundreds of thousands to delete. Often, we came across the message “Auto-refreshing stale oath token” which is part of the httr (that it automatically uses a valid refresh token to obtain a new access token).

However, limits apply to the number of refresh tokens that are issued per client-user combination, and per user across all clients, and these limits are different. If your application requests enough refresh tokens to go over one of the limits (as is the case here), older refresh tokens stop working.

Key points about Google tokens: 

  • The Google access token lasts for only 60 mins (as of writing this post)
  • There is currently a limit of 50 refresh tokens per user account per client. 

 

Solution: 4 – Google Cloud Platform And The User Deletion API

Here’s where my travels ultimately took me: Using the Google Analytics user deletion API.

To be frank, I find it pretty challenging to use. It’s 30,000 feet above Mt. Everest kinda document which didn’t make the job easier. On top it I couldn’t find any useful example while coming across errors to help me troubleshoot. ( If you don’t believe me, please stop reading the post right here and go and try deleting the users using the above document! ).

Finally, we could come up with the below solution:

Prerequisite: 

  1. Set up a service account and generate credentials. 
  2. Enable the Google Analytics API in the GCP console. 
  3. Authorise the service account access to your GA-property. 
  4. If you are not familiar with the above three, just follow first three steps of  dundas blog.
  5. Set up a Python environment and install Google Cloud libraries. 
  6. Use the below code chunk paste it in your preferred Python IDE (yes, its Python if you still didn’t guess it), replace the necessary fields with your own data and pat your back for your marvellous achievement.  

Code chunk: 

#import the necessary libraries:

from google.oauth2 import service_account
import googleapiclient.discovery
import datetime
import csv
import pandas as pd

#define the scope
SCOPES = ['https://www.googleapis.com/auth/analytics.user.deletion']

# I am using a service account to call the API 
SERVICE_ACCOUNT_FILE = 'credentials.json'# Path to your service account credentials

#Load up the credentials
credentials = service_account.Credentials.from_service_account_file(
  SERVICE_ACCOUNT_FILE,
  scopes=SCOPES
)

#build the service object
analytics_client = googleapiclient.discovery.build(
  'analytics',
  'v3',
  credentials=credentials
)

#initialise
user_deletion_request_resource = analytics_client.userDeletion().userDeletionRequest()

# this is where the action happens:
def delete_users(id):
            return user_deletion_request_resource.upsert(
            body = {
  "deletionRequestTime": str(datetime.datetime.now()),# This marks the point in time at which Google received the deletion request
  "kind": "analytics#userDeletionRequest",  # Value is "analytics#userDeletionRequest"
  "id": {  # User ID Object.
      "userId": id,  # The User's id
      "type": "CLIENT_ID",  # Type of user (APP_INSTANCE_ID,CLIENT_ID or USER_ID)
      },
      "webPropertyId": "UA-XXXXXXX-YY"  # Web property ID of the form UA-XXXXX-YY.
        }
        ).execute()


if __name__ == '__main__':
    
    start_time = datetime.datetime.now() 
    print("Started execution at:",start_time)    
    df = pd.DataFrame()
    with open('your_file.csv', 'rt') as csvfile:
        reader = csv.reader(csvfile,dialect='excel')
        reader_list = list(reader)
    row_count = sum(1 for row in reader_list)
    print("number of Client_Ids:",row_count)# mine was over 50000 
    
    for row in reader_list:
        id=row[0].replace("'","") # bit of data-preprocessing for my data       
        response =delete_users(id)
        print(response)
        df = df.append(response, ignore_index=True)
    
    #I want to save my response in  a csv file (just for record)
    df.to_csv('record_response.csv', sep='\t', encoding='utf-8')

    end_time = datetime.datetime.now()
    print("Finished operation at:",end_time)    
    total_execution_time = end_time-start_time

Points to consider:  

  • Limit: Like any other API, this too has a quota limit. (50,000 request per day which can be increased)
user deletion API quota limit
user deletion API quota limit
  • Errors: Yes you may run into one of the following errors, but if you’re getting another sort of error you can email me (details below) and I’ll see if I can help 
  • Current speed: we could delete roughly 50k client ids in 8 hours’ time. Is it the best solution? Probably not, but I found all the above ones were less so (though I’m keen to see if we can get it faster!)

 

Future steps:

Parallel processing or parallel threading - what you reckon? I have another batch of 100k client_ids to be deleted. And I would love to collaborate if you have any fresh ideas. 

If you need any help with the above, feel free to reach me at ahmad@datarunsdeep.com.au or use the 'get in touch' button below. 

Get in touch

To find out more, email us on hello@datarunsdeep.com.au or complete a quick contact form.

Get in touch

Talk to us: 1300 737 885

08:30 - 16:30, Monday to Friday. We love to chat

Visit us, we’re at:

Level 8, 313 Little Collins Street Melbourne, VIC 3000

Footer

  • Home
    • Who We Are
      • What We Do
      • What We're Up To
      • Careers
      • Contact
  • Training
    • Google Analytics Training Courses
    • Advanced Google Analytics Training
    • Google Tag Manager Training
  • Consulting
    • Analysis
    • Implementation
    • Strategy
    • Google Analytics Audit
    • Google Tag Manager Audit
  • Google Cloud Partner
  • Google Marketing Platform Sales Partner
  • Bitcoin accepted here
  • Registered Looker Consulting Partner Melbourne
  • Twitter
  • LinkedIn
  • Google Plus
  • Blog RSS feed
Copyright © 2009-2021 Data Runs Deep Pty Ltd. All Rights Reserved. Google Analytics is a trademark of Google Inc.
Website by Marin Sekesan