OWOX BI can collect session data using one of the two methods: the first one is based on Google Analytics API while the second or based on raw hit data with the OWOX BI own algorithm.
Why the new algorithm?
Because we want you to get complete and accurate data on user behavior at your website. Here are the benefits of using the OWOX BI algorithm:
- The OWOX BI algorithm doesn’t depend on GA Core Reporting API and calculates sessions based on raw non-sampled hit data
- No interruptions of session tables collection caused by the GA Core Reporting API limits exceeding or no access to Google Analytics. No delays caused by the session table fields import from Google Analytics
- OWOX BI doesn't provide limits on data uploading while Google Analytics does. All your data will get into Google BigQuery tables
With the new algorithm, you also can:
- Track if a direct click is truly direct, not fetched from a paid source. With the OWOX BI algorithm, you can do it thanks to the
trafficSource.isTrueDirectfield and attribute site visits by two models: Last Non-Direct Click and Last Click
- Аnalyze how audiences from different websites overlap by collecting an additional anonymous identifier, OWOX User ID
When it’s time to move to the OWOX BI algorithm
- The number of sessions on a website is close to or more than 500К for the selected date range (works for Analytics Standard).
- You often stumble upon the data sampling in Google Analytics
- You often face the 500 hits limit per session
- You need to track the true source of a visit using the
trafficSource.isTrueDirectfield which is available only in BigQuery Export for Google Analytics 360
- You want to unite the audiences across domains using additional user identifier OWOX User ID and analyze how these audiences overlap
What is the difference in a table data structure
Sessions tables created based on the OWOX BI algorithm have the same structure as the tables collected based on Google Analytics data. There are only a few differences in some fields and their values:
totals.*field had prevously contained the total hit value from Google Analytics. Now this field contains has the duplicated value from the
totalsStreaming.*field showing the total number of hits collected by OWOX BI.
- The tables contain the
customGroupsfields. However, they all have the hit-level scope. It’ll be possible to define the scope of custom dimensions in further updates.
- The tables contain the field
isTrueDirectthat helps you understand if a visit is direct (then the value is true), or its source/medium is fetched from a paid source.
- The tables contain the field
The settings needed to set up the OWOX BI algorithm
- If you already have set up session data collection based on Google Analytics, update the tracking code once you've set up the session data collection.
- If using the Google Ads auto-tagging, first turn on the raw reports upload from Google Ads to BigQuery using the Google Data Transfer native integration to get the auto tag data (with gclid), then, in the session data collection in OWOX BI, show the path to the BigQuery dataset containing this reports. Skip this if using the manual tagging with the utm tags.
- The data in the
user.idfield is collected based on the userId (&uid) parameter not using the custom dimension. If on your website you don't have the tracking and collecting of &uid, set it up using the standard method.
Differences in data collection
Based on Google Analytics API
Based on OWOX BI algorithm
The SessionID values, traffic source, geo, and device type data are uploaded using Google Analytics Core API. The beginning and the termination of sessions is defined by GA logic.
Once we get this session data, we add to session tables raw hit data from the "streaming" tables.
Sessions are formed in Google BigQuery based on raw data collected with the OWOX BI algorithm.
The triggers if the session beginning and termination are the same as in Google Analytics.
Sessions calculation when sending data via Measurement Protocol
If the value of the &qt parameter sent via Measurement Protocol is greater than 4 hours, the hit will disappear and won't get to any session.
Hits sent via Measurement Protocol with the value of the
The hits will get to a hit data table ("streaming") not depending on the
Traffic source determination
The session source is defined according to the Last Non-Direct click model. This means all direct visits get the source assigned from the last non-direct visit for the last 6 months.
There’s no possibility to define whether a visit was a direct one.
Session sources are attributed to traffic sources by applying the Last Non-Direct Click attribution model, the same as in Google Analytics.
To track the actual traffic source, we have added values to the
Definition of the utm values for Google Ads auto-tagging (gclid)
Defined by the Google Analytics API which has the native integration with Google Ads.
You need reports with raw Google Ads data in BigQuery. They can be easily set up with the Google Data Transfer native integration.
Tables are divided by days according to the Google Analytics property timezone.
Every session is a different string with nested fields containing raw hit data.
No differences in the table structure.
The start time of session data collection for the previous day
At 5 a.m., since data become available in Google Analytics Core API at 4 a.m. according to the Google Analytics property timezone.
At 1 a.m. according to the Google Analytics property timezone.
Filtered session data from Google Analytics is used according to the current property filters.
Session data is not filtered in any way.