Google’s BigQuery is a useful tool for quickly exploring a lot of data. One of the nice things about it being part of the large Google Cloud Platform (GCP) stack is that it has good integration into other Google services. This is most prominently seen in Google Analytics’ one-click BigQuery integration using Google’s Product Linking (only for Premium/360 members), but other products can easily transfer data to BigQuery as well. Today we’ll be looking at YouTube.
If you want to get your YouTube data into BigQuery (BQ) then you are going to want to do a YouTube Channel transfer. There is another YT-to-BQ transfer called “YouTube Content Owner” but if you run a single YouTube channel and want to dig into its data, then go with the “Channel Transfer” method.
Using this you can get the last 180 days of your YouTube channel’s data and put it directly into BigQuery. If you want data before then, you’ll have to use the YouTube API’s historical data endpoint.
Let’s assume you have a Google account, GCP is active with billing setup, you’ve enabled BigQuery API, and done all the other absolute basics. Let’s also assume that you have all the permissions handled. The IAM can be a headache with larger GCP setups, but basically you’ll need the Project Owner role to easily do all the following steps.
If project owner is too much to ask then you will be able to get by with
bigquery.datasets.update permissions or the
bigquery.admin predefined role.
The most important step in the process is called the Data Transfer. This is the service by which Google copies data into BQ. This can be done with the web UI or console, but we’ll focus on the UI side as it’s quite a short process.
Go to your BigQuery UI (we’ll be using classic in this example) and create a new dataset. Then to the Transfers page and click “Add Transfer.” For Source pick “YouTube Channel.” You’ll see the following fields:
https://plus.google.com/b/x1234/. With Google+ being dead in the water this is more of a challenge. If you can’t get to your Google Plus page, then to your YT channel switcher page and how over the channel you want. You’ll see the page ID in the destination URL. If the desired channel is your default channel then this method won’t work and you’ll need to find your channel ID through Google Plus.
So the only thing that really matters above is the channel page ID. This can be a real hassle to find (at least it was for me), but eventually you’ll be able to track it down.
Now here is the next issue: If this is your first time connect YouTube to BigQuery, the process will fail for 2-3 days. No matter what you do, it will fail and then, around 48-72 hours later, it will start working. I emailed Google about this and read the docs — it’s a known issue, so take it or leave it.
With that done and the waiting period over, you’ll get the following in your BigQuery dataset:
In the BQ world tables and views look similar, but they aren’t quite the same thing. A view is a virtual table, sort of like a shortcut. You can query it in the same way you would a table, but the underlying data is limited to your view.
If you are interesting in general YT analytics, you’ll going to want to focus on the
channel_basic_a2 view. This will get you the main metrics from YouTube, like views, comments, like, shares, watch time, average view duration, and so on.
Before you get going you’ll have to understand BQ’s handling of partitioned tables. This will be done through the
_PARTITIONTIME pseudo-column, which let’s you deal with dates when you are not using a pre-defined view. Views can just use the
Here is a very simple query for video views:
SELECT video_id, views, date
ORDER BY views DESC
Note the date column: you’ll see multiple dates returned for a given video_id. Not super helpful is it? You can add a WHERE to that query:
date >= "20181118"
which will help a bit. Working with a table instead of a view, and you can do something like this:
SELECT red_views, video_id
_PARTITIONTIME >= "2018-11-10 00:00:00"
AND _PARTITIONTIME < "2018-11-20 00:00:00"
ORDER BY red_views DESC
which will use the partition time instead of the date.
With an understanding of the date and partition time functions, your data in BigQuery, and a general understanding of YouTube analytics you should be all set with querying your data and starting to do any advanced YouTube reporting you need.
Any time you use BigQuery you should be very conscious of the costs associated with exploring your data. The good news if that unless your channel is absolutely huge, the dataset is going to be limited and costs should be very controllable.
Normally with BQ it’s the analysis (data progressing) that will be expensive, but in the case of working with my channel it was actually the data transfer that used up the most money. Google charges $5/month/channel for YouTube data transfer into BigQuery, which is fairly reasonable. Querying can get very expensive if you are working with a lot of data, but at $5/TB processed and most of my YouTube queries working with under 2MB of data I saw no processing fees incurred.
Sal November 21st, 2018
Posted In: Google Cloud Platform