Cloud Confusing

Explaining hosting, AWS, Wordpress, static sites, and all manner of cloud solutions.

Google’s BigQuery is a useful tool for quickly exploring a lot of data. One of the nice things about it being part of the large Google Cloud Platform (GCP) stack is that it has good integration into other Google services. This is most prominently seen in Google Analytics’ one-click BigQuery integration using Google’s Product Linking (only for Premium/360 members), but other products can easily transfer data to BigQuery as well. Today we’ll be looking at YouTube.

If you want to get your YouTube data into BigQuery (BQ) then you are going to want to do a YouTube Channel transfer. There is another YT-to-BQ transfer called “YouTube Content Owner” but if you run a single YouTube channel and want to dig into its data, then go with the “Channel Transfer” method.

Using this you can get the last 180 days of your YouTube channel’s data and put it directly into BigQuery. If you want data before then, you’ll have to use the YouTube API’s historical data endpoint.

First Steps To Use BigQuery

Let’s assume you have a Google account, GCP is active with billing setup, you’ve enabled BigQuery API, and done all the other absolute basics. Let’s also assume that you have all the permissions handled. The IAM can be a headache with larger GCP setups, but basically you’ll need the Project Owner role to easily do all the following steps.

If project owner is too much to ask then you will be able to get by with bigquery.transfers.update and bigquery.datasets.update permissions or the bigquery.admin predefined role.

Getting YouTube Data Into BigQuery

The most important step in the process is called the Data Transfer. This is the service by which Google copies data into BQ. This can be done with the web UI or console, but we’ll focus on the UI side as it’s quite a short process.

Go to your BigQuery UI (we’ll be using classic in this example) and create a new dataset. Then to the Transfers page and click “Add Transfer.” For Source pick “YouTube Channel.” You’ll see the following fields:

  • Display name – This is simply the name of the transfer
  • Schedule – What time on each day the transfer will take place (daily is the only option). Note that time is UTC, not local
  • Destination dataset – Which BQ dataset the transfer will go into
  • Channel page ID – This is the confusing part! Google says to go to your Google+ page and look for the ID in the URL. That would be the x1234 from https://plus.google.com/b/x1234/. With Google+ being dead in the water this is more of a challenge. If you can’t get to your Google Plus page, then to your YT channel switcher page and how over the channel you want. You’ll see the page ID in the destination URL. If the desired channel is your default channel then this method won’t work and you’ll need to find your channel ID through Google Plus.
  • Table suffix – This is how you tell difference transfer jobs with the same dataset apart. Remember, it’s a suffix, not a prefix!
  • Configure jobs – This is a required permission for YT data transfers.
  • Advanced – Here you can work with a Pub/Sub topic, setup notifications, or disable (without deleting) the transfer.

So the only thing that really matters above is the channel page ID. This can be a real hassle to find (at least it was for me), but eventually you’ll be able to track it down.

Now here is the next issue: If this is your first time connect YouTube to BigQuery, the process will fail for 2-3 days. No matter what you do, it will fail and then, around 48-72 hours later, it will start working. I emailed Google about this and read the docs — it’s a known issue, so take it or leave it.

With that done and the waiting period over, you’ll get the following in your BigQuery dataset:

  • 11 Channel Views
  • 17 Channel Tables
  • 6 Playlist Views

In the BQ world tables and views look similar, but they aren’t quite the same thing. A view is a virtual table, sort of like a shortcut. You can query it in the same way you would a table, but the underlying data is limited to your view.

Using BigQuery With YouTube Data

If you are interesting in general YT analytics, you’ll going to want to focus on the channel_basic_a2 view. This will get you the main metrics from YouTube, like views, comments, like, shares, watch time, average view duration, and so on.

Before you get going you’ll have to understand BQ’s handling of partitioned tables. This will be done through the _PARTITIONTIME pseudo-column, which let’s you deal with dates when you are not using a pre-defined view. Views can just use the date dimension.

Here is a very simple query for video views:

SELECT video_id, views, date
FROM
`Project.Dataset.view`
ORDER BY views DESC
LIMIT
10;

Note the date column: you’ll see multiple dates returned for a given video_id. Not super helpful is it? You can add a WHERE to that query:

WHERE
date >= "20181118"

which will help a bit. Working with a table instead of a view, and you can do something like this:

SELECT red_views, video_id
FROM
`Project.Dateset.table`
WHERE
_PARTITIONTIME >= "2018-11-10 00:00:00"
AND _PARTITIONTIME < "2018-11-20 00:00:00"
ORDER BY red_views DESC
LIMIT
10;

which will use the partition time instead of the date.

With an understanding of the date and partition time functions, your data in BigQuery, and a general understanding of YouTube analytics you should be all set with querying your data and starting to do any advanced YouTube reporting you need.

BigQuery Costs

Any time you use BigQuery you should be very conscious of the costs associated with exploring your data. The good news if that unless your channel is absolutely huge, the dataset is going to be limited and costs should be very controllable.

Normally with BQ it’s the analysis (data progressing) that will be expensive, but in the case of working with my channel it was actually the data transfer that used up the most money. Google charges $5/month/channel for YouTube data transfer into BigQuery, which is fairly reasonable. Querying can get very expensive if you are working with a lot of data, but at $5/TB processed and most of my YouTube queries working with under 2MB of data I saw no processing fees incurred.

November 21st, 2018

Posted In: Google Cloud Platform

Tags: ,