Grab free access to all the OWOX BI features for 14 days

User behavior data discrepancies between OWOX BI and Google Analytics: possible reasons and how to fix them

OWOX BI’s Google Analytics → Google BigQuery pipeline uploads user behavior data in parallel with Google Analytics’s tracking. OWOX BI uses its own algorithms that enable you to collect all the data about user actions on your website, with no limitations or exceptions.

As a result, you get to your BigQuery storage the data that Google Analytics can miss or sample due to its limitations. This kind of discrepancies between OWOX BI and Google Analytics are acceptable and only tells you that you’ve got to your BigQuery the more relevant data that you’ve got to GA.

However, discrepancies may also occur due to incorrect adjustments in your data sources. The reasons for such discrepancies must be found and gotten rid of to make sure your data in BigQuery is full, quality, and relevant.

In this article, we’ve assembled a checklist of all the known reasons for data discrepancy issues. Following the checklist, you’ll be able to check and fix the majority of the unexpected discrepancies by yourself.

How to allocate data discrepancies

OWOX BI automatically tracks the discrepancies between the data you collect via the Google Analytics → Google BigQuery pipeline and the data in your Google Analytics property.

You can find these statistics, including the date of the last data updates in both services, in the Sessions tab on your pipeline’s page in OWOX BI:Session_data_discrepancy_en.png

You can track the discrepancies manually by comparing Google Analytics reports and the reports on your BigQuery data. The benefit of the manual method is that you’re getting the most relevant info since it’s possible that the information in the OWOX BI interface may take more time to refresh.

Here are some request examples in standard SQL which you can use in Google BigQuery to compare BigQuery data with Google Analytics data. Just copy-paste a request from here and replace  `Project.Dataset.owoxbi_sessions_20190821` with "Project" — name of your Google BigQuery project, "Dataset" — name of a dataset in the project, "owoxbi_sessions_20190909" — name of the table for a certain day with the date in the format "year-month-day".

Request for the number of page views within certain day:

  SELECT
COUNT (DISTINCT hits.hitId) pageviews
FROM
`Project.Dataset.owoxbi_sessions_20190821`, UNNEST (hits) hits
WHERE hits.type = 'pageview'

In Google Analytics see this data in the Audience>Overview report. Make sure you've selected the right day for the report.

Request for the number of events for a certain day:

SELECT
COUNT (DISTINCT hits.hitId) events
FROM
`Project.Dataset.owoxbi_sessions_20190821`,
UNNEST (hits) hits
WHERE
hits.type = 'event'

In Google Analytics see this data in the Behavior>Events>Overview report. Make sure you've selected the right day for the report.

Request for the number of transactions for a certain day:

SELECT
COUNT (hits.transaction.transactionId) transactions
FROM
`Project.Dataset.owoxbi_sessions_20190821`,
UNNEST (hits) hits
WHERE
hits.eCommerceAction.action_type ='purchase'

In Google Analytics see this data in the Conversions>Ecommerce>Overview report. Make sure you've selected the right day for the report.

When you should care about the discrepancies

The acceptable discrepancies range between the OWOX BI and Google Analytics data is:

  • within 2% for hit data
  • within 3.5% for session data

Discrepancies above these limits are still acceptable in these cases:

  • If the amount of data processed by OWOX BI is larger than in Google Analytics. OWOX BI doesn’t have the same limitations as GA does, and the max size of a hit processed by OWOX BI is 16KB against GA’s 8KB. So it’s normal that OWOX BI collects more data.
  • The same is fair if OWOX BI collects more sessions than GA. Session data is based on hit data, so it’s possible that GA can miss some events that led to new sessions.

In any other case of discrepancies — you should allocate their reason and fix it if possible. Following the checklist below you can quickly do it yourself, but keep in mind that you can always contact our support team by the online chat or by writing at bi@owox.com — we’ll happily help you sort these discrepancies out :)

Possible reasons for hit data discrepancies

Google Analytics and Google Tag Manager settings

  • Make sure the Google Analytics property you’ve selected when creating the pipeline doesn’t have any filters. The filtered out data still gets collected to Google BigQuery, thus the discrepancies. Read in Google Analytics help about how to manage property filtering.
  • If you have changed the time zone setting in the GA property you had previously selected when creating the pipeline, then the data collected by Google Analytics will be corrected retrospectively. However, the data in BigQuery will be not: it’s being uploaded to data tables separated by days and can’t migrate between these tables.
  • If you have OWOX BI data collection implemented via customTask in Google Tag Manager, make sure the customTask variable is added to every Universal Analytics tag you are using to send data to BigQuery:Custom_task_tags_en.png Pay attention: this is the most common discrepancy reason.
  • Make sure your website doesn’t have more than one tracking method implemented simultaneously. For example, you’ve implemented tracking via a GTM container with a customTask but haven’t removed a gtag.js from the pages you track.
    To check if this is the case, open your website in the Google Chrome web browser, open the developer console (right-click on a page>Inspect element). In the console, press Ctrl+F (cmd+F for Mac) to call the search field, then check your tracking codes using these keywords: GTM / analytics.js / gtag.js. Make sure you don’t have more than one chunk of code with either of these keywords.
  • Make sure the Google Tag Manager container you use to send data to BigQuery has only one customTask variable which function is to send hits to the OWOX BI access point. Any additional customTask would override the previously added one breaking the data transfer to Google BigQuery.

Sending data via Measurement Protocol

To factor out the use Measurement Protocol as the cause of the data discrepancies, make sure that:

  • All the data you are sending to Google Analytics using the Measurement Protocol, you also send it to the google-analytics.bi.owox.com access point.
  • The request contains the GET parameter tid.
  • If there should be a delay between the actual hit event and its transition to the access point, add a &qt parameter value (queue time, the length of the delay).

All the details about using Measurement Protocol with OWOX BI.

Google Analytics limitations

OWOX BI’s tracking code for the Google Analytics → Google BigQuery user behavior data pipeline collects the events data directly from the website, at the moment a user had triggered the event. Then, the hit is transmitted to the dedicated OWOX BI access point. At the same time, the Google Analytics tracking code sends the same hits to the GA access point

Since there are two separate access points for each tracking code, the data collected by OWOX BI isn’t affected by Google Analytics limitations, such as:

  • 200,000 hits/user/day
  • 10 million hits/month
  • 500 hits/session

The maximum hit size the OWOX BI access point can receive is 16 KB. It’s twice as large as the GA access point can receive, so OWOX BI can upload to Google BigQuery more events.

Due to these differences between OWOX BI and Google Analytics tracking, you can have expected discrepancies. This is expected because OWOX BI collects full raw user behavior data while Google Analytics can filter out some chunks of valuable data.

Possible reasons for session data discrepancies

Google Analytics and Google Tag Manager settings

  • Similarly to hit data, make sure the Google Analytics property you’ve selected when creating the pipeline doesn’t have any filters. The filtered out data still gets collected to Google BigQuery, thus the discrepancies. Read in Google Analytics help how to manage property filtering.
  • If you have changed the time zone setting in the GA property you had previously selected when creating the pipeline, then the data collected by Google Analytics will be corrected retrospectively. However, the data in BigQuery will be not: it’s being uploaded to session data tables aligned with certain days and can’t migrate between these tables.
  • If you have a referral exclusion list in Google Analytics, you need to recreate it in OWOX BI as well. Otherwise, it’s possible to get more sessions in OWOX BI than in GA since the sources of these sessions weren’t excluded for the OWOX BI pipeline.
  • Google Analytics automatically filters out bot-generated traffic excluding sessions triggered by various bot activity. Since collecting all the data about events on a website is OWOX BI’s signature feature, we currently don’t filter out the bot-generated traffic. This eventually can lead to some discrepancies between direct session traffic data.
    However, the upcoming OWOX BI updates will enable you to single out quality sessions from the bot-generated ones. Watch the updates :)

Sending data via Measurement Protocol

To prevent the hits sent via Measurement Protocol from falling into different sessions in GA and OWOX BI, make sure that:

  • All the data you are sending to Google Analytics using the Measurement Protocol, you also send it to the google-analytics.bi.owox.com access point.
  • When sending a &qt parameter’s value larger than 4 hours, the hit won’t get to Google Analytics at all, which would lead to discrepancies: OWOX BI will process more data since our algorithm can align a hit with a session with queue time up to 30 days, while in GA it’s only 4 hours.

Set the &qt value to more than 4 hours only if you absolutely need a hit sent via Measurement Protocol to get into the session in which they actually happened. Read more about using Measurement Protocol with OWOX BI.

Possible reasons for ad cost data discrepancies

The lack of tags in the advertising services

Ad cost data is uploaded to Google Analytics from ad services via OWOX BI’s cost data import pipelines. After that, a Google Analytics Cost Data → Google BigQuery pipeline uploads this data to session data tables.

The sessions’ costs are being attributed equally to all sessions with the UTM tags source, medium, campaign, keyword, and content, and stored in the fields adCost and attributedAdCost in the session data tables.

The discrepancies may appear here if the original ad had no required UTM tags set in the ad service.

  • All ads should contain source, medium, campaign, keyword, and content tags, so the ad costs would be attributed to the sessions in BigQuery tables correctly.
  • For OWOX BI to attribute ad costs at all, ads must contain at least source and medium tags. Without them, ad costs won’t get to the BigQuery tables.
  • If you’re sure the ad in the ad service contains all the required UTM tags, but you still see the discrepancies, contact us by writing at bi@owox.com and we’ll get rid of these discrepancies.

Some nuances in ad cost data updates for auto-tagged Google Ads campaigns

Even if auto-tagged Google Ads campaigns data collection is set up properly in OWOX BI, it’s still possible that Google Ads cost data in the owoxbi_sessionstable doesn’t match the Google Analytics data.

This may occur when the data in the source table was updated after the regular data update in the owoxbi_sessions ended. In this case, it’s nothing to worry about: the data is not lost, OWOX BI will upload it with the very next data update.

The updates are daily and can continuously update data from 6:00 AM through 2:00 PM and from 6:00 PM through 2:00 AM. Within these update windows, OWOX BI checks and updates data in the owoxbi_sessions_ tables several times — to make sure all the fresh cost data from Google Ads made their way to the session data table.

Considering this, we recommend you to check cost data discrepancies during the gaps between these update windows — these are the periods when Google BigQuery has the most relevant data.

Was this article helpful?
0 out of 0 found this helpful
Have more questions? Submit a request

0 Comments

Please sign in to leave a comment.