Hello, dear reader! I put together this blog post back when I worked in marketing. It's been an interesting couple of years - I'm now a developer, and my new writings can be viewed here. However, I noticed this page was still getting hits though the post no longer existed, so found a way to keep it up here since it seems to be of interest to someone out there.
I will keep it up in this mysterious locale, away from the rest of my site, enjoy!
Visualising views per page in Data Studio
Data Studio is a fantastic free data visualisation tool from Google. You can create slick reports in a short amount of time, and best of all it hooks up with ease to a number of Google’s other products, including Google Analytics. That means you can quickly create a GA report which updates dynamically to show off the results of your marketing and content efforts to internal stakeholders or clients.
What this post explores is how to use Data Studio to dynamically report on which sections of a site deliver the most views per page, rather than which sections of a site deliver the most total page views. Understanding the difference can significantly improve your understanding of which content performs best on your site.
Consider the following case as an illustration:
Case: A content site wants better insight into its content performance
The team behind a particular content website, Hillgrove Publishing, focus their efforts on two kinds of content: general news and opinion columns. They want to know where to focus their efforts in the future.
By looking at the ‘Content Drilldown’ tab in Google Analytics, they see that the ‘general news’ section delivers ten times the page views of original columns. So should they focus their efforts on general news? Well, not necessarily.
What if they produce 1000 news articles each month, but only ten opinion columns? That would mean that an individual opinion column produces significantly more traffic than an individual news piece.
What the team needs is a function which will give them the ‘views per page’ for each section. If they could track this dynamically, they would have a much better idea of which content is performing better over time.
Unfortunately, Google Analytics doesn’t offer any kind of ‘views per page’ metric out of the box. It’s rather manual task – you’ll have to look in the GA interface to count the number of pages in a section, then manually divide the total page views for a section by the number of pages in that section.
Fortunately – we can create a dynamic report in Data Studio which will provide this data for us. Doing so will involve a couple of the more complex facets of Data Studio: regular expressions (regex) and CASE Statements. One disclaimer here: this will only report on pages that have at least 1 page view during your reporting period.
1. Pull your GA data into data studio
Open up Data Studio. If it’s not hooked up to your Google Analytics account already, choose ‘Add New Data Source’ and authorise the Google Analytics connector. Once it’s authorised, add the data source to your report.
2. Identify areas of a site using a CASE statement
Google Data Studio allows users to utilise CASE statements in order to add their own custom data fields alongside GA’s native data fields like ‘Page Views’ or ‘Gender’. We’re going to use the CASE statement to organise our site content into categories based on their URLs, and then output our pages into specific categories, for example all news articles will be categorised as ‘News’.
CASE statements will be familiar to those who know the basics of SQL or similar query languages, and potentially bewildering to everyone else – so I’ll try to systematically explain the basics in order to help you create your own query.
There are several ways to identify areas of your site in a CASE statement – I’m going to start with a ‘simple’ query, then show you how a more complicated RegEx query could be constructed.
2.a. Identify areas of a site (with a simple structure) using a CASE statement
Let’s take the earlier example of Hillgrove Publishing, who wanted to separate ‘news’ and ‘opinion’ content.
To complicate matters, they have started adding certain news stories under /top-stories/
as well as /news/
. Well, a CASE statement would make it easy for us to identify areas of the site in a way that Data Studio could understand:
CASE
WHEN Page Path 1 = ‘/opinion/’ THEN "Opinion"
WHEN Page Path 1 = ‘/news/’ THEN "News"
WHEN Page Path 1 = ‘/top-stories/’ THEN "News"
ELSE “other”
END
We could add this function as a field in Data Studio by clicking the pencil (edit) icon on our Google Analytics data source. We would then create a custom field, paste this code into the ‘Formula’ text entry field, and add a sensible name to the ‘Name field’.
But what exactly does all that code mean? Let’s pick it apart line by line.
CASE
This tells Data Studio that you’re about to start a CASE statement.
WHEN [x] THEN [y]
[x]
here is a condition that must be met, and [y]
is the output if this condition is met – this output can be text (a string) or a number. In the above CASE statement, we are checking the first section of the URL following the “.com” – which is a field called ‘Page Path 1’ in Google Analytics. If this section is /opinion/
the CASE statement will output ‘Opinion’.
This CASE statement will create a new Custom Field available throughout our Data Studio report alongside all of GA’s native fields. We’re going to call it ‘Content Category’ – it’ll be a Dimension (not a Metric).
For the regular expression given, an input of the following URLs will output ‘Opinion’:
hillgrove-publishing.com/opinion/`
`hillgrove-publishing.com/opinion/my-first-column.html
Conversely, the following will output ‘News’:
hillgrove-publishing.com/news/`
`hillgrove-publishing.com/news/breaking/`
`hillgrove-publishing.com/top-stories/
Finally, the following will output ‘Other’:
hillgrove-publishing.com/2017/opinion/article/`
`hillgrove-publishing.com/about/`
`hillgrove-publishing.com/`
`hillgrove-publishing.com/breaking/news/omg.html`
`hillgrove-publishing.com/breaking-news/
This works for simple URL structures where ‘news’ or ‘opinion’ always appear in ‘Page Path 1’, but what about more complicated structures? It may be that you need to use regular expressions to define your URLs.
2.b. Identify areas of a site (with a complicated structure) using RegEx in a Case Statement
For the earlier example of Hillgrove Publishing, who wanted to separate ‘news’ and ‘opinion’ content, the finished item could also have looked like this:
CASE
WHEN REGEXP_MATCH(Page, .*/opinion/.*") THEN "Opinion"
WHEN REGEXP_MATCH(Page, .*/(news|top-stories)/.*") THEN "News"
ELSE “Other”
END
In this CASE Statement, REGEXP_MATCH
is a function which assesses whether an input matches a corresponding ‘Regular Expression’ – a method for defining strings.
In this example, the function is checking whether the ‘Page’ (i.e. URL) contains the characters ‘/opinion/’. The full stop and asterix .*
are special ‘wildcard’ characters in RegEx: a full stop .
can stand in for any character, and an asterix *
indicates that a character can be repeated any number of times (including zero times). Therefore .*
represents any possible sequence of characters.
The pipe character |
is another wildcard which functions as an ‘or’. In the above CASE statement, URLs containing either /news/ or /top-stories/ will both output as ‘News’.
For the regular expression given, an input of the following URLs will output ‘Opinion’:
hillgrove-publishing.com/opinion/`
`hillgrove-publishing.com/opinion/my-first-column.html`
`hillgrove-publishing.com/2017/opinion/article/
The following will output ‘News’:
hillgrove-publishing.com/news/`
`hillgrove-publishing.com/news/breaking/`
`hillgrove-publishing.com/breaking/news/omg.html`
`hillgrove-publishing.com/top-stories/
Finally, the following will output ‘Other’:
hillgrove-publishing.com/about/`
`hillgrove-publishing.com/`
`hillgrove-publishing.com/breaking-news/
RegEx gives us another system to categorise URLs, and can allow us to categorise content categories for sites that have a more complicated structure. RegEx has a lot more functionality than outlined here: much more detail can be found at https://github.com/google/re2/wiki/Syntax.
3. Count the number of pages in each section by modifying your CASE Statement
Counting the number of pages should now involve the fairly simple task of creating a similar CASE Statement which outputs a number instead of a string. Let’s assume from now on that Hillgrove Publishing opted for the simpler CASE Statement (though this step would work with either mentioned method).
To count the pages in a section of their site, they’d use the following CASE Statement.
SUM(CASE
WHEN Page Path 1 = ‘/opinion/’ THEN 1
WHEN Page Path 1 = ‘/news/’ THEN 1
WHEN Page Path 1 = ‘/top-stories/’ THEN 1
ELSE 0
END
This CASE statement basically identifies whether a URL meets one of three ‘when’ criteria: if it does, it outputs a 1, otherwise it outputs a 0. The SUM
function then adds all the resulting numbers together.
This function will count the number of pages that meet the criteria outlined in the CASE Statement, and will appear as a custom Metric (not dimension) alongside GA’s native metrics. We’ll call this field ‘Number of Pages’
We can show this in a table with ‘Content Category’ as our Dimension to see the number of pages in each section:
You can tally this against the number of pages in GA’s content drilldown to check it for accuracy.
4. Counting the views per page
We now have a method for categorising site content, and another method for counting the number of pages in each category. Now we can use them together in a graph.
First we create a final Custom Field. We’ll call it ‘Views per Page’. It won’t require any complex CASE syntax, only:
Page Views/Number of Pages
Finally we log this on a graph with ‘Content Category’ as a dimension. You may wish to apply a filter which excludes pages with a Content Category of ‘Other’ in order to focus on your chosen content categories rather than general navigational pages.
And that’s it! You now have a dynamic report that will continue to update with data from Google Analytics, providing the views per page for each section you’ve outlined. You can even plot this on a line graph to chart how the numbers change over time.
If you can specify or at least estimate the cost to your business of each type of content, you will now be able to work out the cost per view for each of your content types – and figure out exactly how lucrative each content type is to the business. Not bad eh?