Google Analytics is an incredible resource that provides tons of information about a website’s performance across many different metrics like the number of visitors to your website.

For business owners (and the data nerds among us), it’s important to know that the traffic going to your site is quality traffic, real people and not bots, because real people are who you’re trying to connect with. To ensure that you’re getting the most accurate information from your website data, read on to know how you can exclude fake traffic from your Google Analytics data.

Should I Really Care?

Yes! As a business owner, you want clean data. You want to make sure the data you’re seeing and using for your business is quality data and not enhanced or manipulated in any way because your website data is what helps you make good decisions for your marketing strategies.

You want to make adjustments to your site or service based on what real people are doing on your site, not bots. Making decisions on bad data can quickly lead to bad business decisions.

Not all Bots are Bad

What is a bot? A bot is a web robot – it’s a software application programmed to run automated tasks or scripts across the Internet. The tasks are simple and repetitive, and bots are able to perform tasks at a much faster and higher rate than a human. Bots are primarily used in web crawls, where an automated script accesses, reads and processes information from web servers across the Internet.

Bots have been a constant on the Internet since the very beginning, but only in recent times have they become synonymous with hackers, bad events, and fake website visits. As of 2015, bots made up more than 50% of internet traffic. While that number can be alarming, it’s important to remember that not all bots are bad.

For example, Facebook uses bots constantly – when you share an article, a bot grabs the image, headline and first paragraph of the article to display on your news feed. Google uses bots to crawl and catalog websites in order to deliver results when someone searches for something in Google. So not all bots are bad, especially when they’re being used to help us.

But bots are also being used to steal personal information and cause all out mayhem on the Internet. From web scraping, data mining and online fraud to account hijacking, data theft, digital ad fraud and more, bots are becoming more and more common when it comes to negative activities on the Internet, including showing up as traffic in your site analytics.

What is Bot or “Fake” Traffic?

Bot or fake traffic occurs when a bot visits a website. Google Analytics can register the visit as a number of different things: a referral, a pageview or an event, and it can show up in a number of different areas of Google Analytics. It looks like it’s legitimate traffic until narrowed down. It can artificially inflate a site’s analytics, which for owners and marketers provides false information and skews website data.

How can I tell if my site’s been impacted?

A quick check of some site metrics can clue you in to whether or not your site’s been impacted by negative bot activity. If you notice some of these things in your Google Analytics it could indicate possible bad bots crawling your site:

  • Bounce Rate: An extremely high bounce rate (70-100%)
  • Pages/Session: Sessions with 0 pages per session (or less than 1)
  • Average Session Duration: Session durations that last a matter of seconds, or even less than a second
  • New Sessions: An extremely high number of new sessions along with a high bounce rate and zero pages per session
  • Geolocation: Tons of traffic from random countries/cities around the world when you don’t target customers or users in those areas

Identify spam traffic in your Google Analytics data

What Can I Do?

Analytics has some handy filters you can use to ensure you block bots and other spam traffic from your analytics data.

The first filter is within the property “View Settings” in your Google Analytics account. You can select the box to “Exclude all hits from known bots and spiders.” This lets Google know that you don’t want data from any known bots or spiders in your analytics.

But since there are ways for bad bots and spiders to circumvent traditional detection, there’s another step you can and should employ when applying filters to your Google Analytics account.

Filter hostname

While bots and spiders can contribute to fake traffic, your site can also be impacted by spam traffic (or ghost spam), which is when a malicious script uses Google’s Measurement Protocol to send raw user interaction data straight to Google Analytics. What that means is that ghost spam can act like a real user, registering sessions, page views and session duration without even touching your website.

Because of this, you have the ability to filter by hostname in order to combat this troublesome nuisance.

You can view the hostname section in Google Analytics in Audience > Technology > Network. Be sure to select “Hostname” as the Primary Dimension. This will let you know the hostnames accessing your website.

Valid hostnames for your site are those that include your domain name, any subdomains you may have, any redirected domains, and any other sites that use your Google Analytics code. They tend to look like this:

  1.  www.domain.com
  2. Domain.com
  3. Domain.googleweblight.com
  4. www.domain.googleweblight.com
  5. Subdomains
  6. Any other domains that use your Google Analytics code

These are all valid hostnames. Once you’re able to pinpoint the valid hostnames for your site, you can then set up the filter.

The filter is made up of basic regex (or regular expression) and is applied at the view level in the Google Analytics account in the admin section.

Select “Filters,” then be sure to name it “Include Valid Hostnames.” You’re telling Google you only want to capture data from hostnames that you know and are not malicious or spammy.

When you create the regex string, it should look something like this:

Intuitivedigital\.com|www\.intuitivedigital\.com|intuitivedigital\.com\.googleweblight\.com|intuitivedigital\.com\.googleweblight\.com

Applying filters can be tricky, especially if you don’t know what you’re doing. It’s important to remember to create an unfiltered view prior to creating a filtered view so that you always have raw, unadulterated data to go back to should the filter not be applied correctly.

Once your regex is completed, and you’ve added it to Google Analytics, bot traffic and spam traffic should no longer appear in your Google Analytics data, which makes your site’s statistics much more accurate and reliable. As a business owner or marketer, having good, clean data is a sure fire way of knowing what’s happening on your site.

Intuitive Digital Can Help You Navigate Your Google Analytics Data

Identifying bot traffic in your Google Analytics account can be difficult if analytics data seems like a foreign language to you. Here at Intuitive Digital, we love analytics and would love to help you understand what’s happening in your Google Analytics account. Contact us if you’d like to know more!