Difference between two session collection algorithms

OWOX BI can collect session data in one of the two ways: based on GA API or based on raw hit data with the new OWOX BI algorithm.

Why the new algorithm?

Because we want you to get complete and accurate data on user behavior at your website. Here are the benefits of using the OWOX BI algorithm:

  1. The OWOX BI algorithm doesn’t depend on GA Core Reporting API and calculates sessions based on raw non-sampled hit data
  2. No interruptions of session tables collection caused by the GA Core Reporting API limits exceeding or no access to Google Analytics. No delays caused by the session table fields import from Google Analytics
  3. The OWOX BI algorithm has no such limits as up to 500 000 sessions a day, 10 million hits a day, or 500 hits per session. All your data will get into Google BigQuery tables

With the new algorithm, you also can:

  • Track if a direct click is truly direct, not fetched from a paid source. With the OWOX BI algorithm, you can do it thanks to the trafficSource.isTrueDirect field and attribute site visits by two models: Last Non-Direct Click and Last Click
  • Аnalyze how audiences from different websites overlap by collecting an additional anonymous identifier, OWOX User ID

When it’s time to move to OWOX BI algorithm?

  1. The number of sessions on a website is close to or more than 200 000 daily.
  2. You need to track the true source of a visit using the trafficSource.isTrueDirect field which is available only in BigQuery Export for Google Analytics 360.
  3. You want to unite the audiences across domains using additional user identifier OWOX User ID and analyze how these audiences overlap.

What is the difference in table data structure

Sessions tables created based on the OWOX BI algorithm:  

  1. Do not contain totals.*, since those fields have a total number of hits collected by GA. For these needs, we recommend using the totalsStreaming.* fields.
  2. Do not contain visitNumber and newVisits. These fields are present in the tables, however, they are empty. The value of these fields will be available with the next product updates.
  3. Contain the field isTrueDirect that helps you understand if a visit was direct.
  4. Contain the field userOwoxId.
  5. Contain fields customDimensions, customMetrics, customGroups, however, they are on the hit level. It’ll be possible to define the scope of custom dimensions in further updates. 

Differences in the data collection

 

Stage

Based on Google Analytics API

Based on OWOX BI algorithm

Formation of sessions

We upload indicators of the sessionID using Google Analytics Core API. Like traffic source, geo data, device type data. So, the beginning and the termination of sessions is defined by GA logic.


Once we get this session data, we add raw hit data to session tables.

Sessions are computed in GBQ based on totally raw data using OWOX BI algorithm.

Logic of a session beginning and termination are the same as in Google Analytics.

Traffic source determination

Source of a session is generated in accordance with Last Non-Direct click model. It means the source of the last non-direct visit is assigned to every direct visit. There’s no possibility to define whether a visit was a direct one. Every direct visit is associated with a source from the last non-direct click visit.

Session sources are attributed to traffic sources by applying the Last Non-Direct Click attribution model, the same as in Google Analytics.

To track the actual traffic source, we have added values to the trafficSource.isTrueDirect field. It indicates if the source of the session started as a direct site visit, or is it follows the session generated by an ad source.

Definition of utm values for AdWords auto-tagging (gclid)

It’s defined according to GA API which has native integration with AdWords.

You need reports with raw Adwords data in BigQuery. They can be set up with the help of native integration Google Data Transfer.

Table Structure

Tables are divided by days in the timezone of GA Property. Every session looks like a different string with nested fields that contain hits.

Is the same.

Collection of sessions for the previous day

At 5 a.m., since data become available in  Google Analytics Core API at 4 a.m.

At 1 a.m.

Data filtration

Filtered sessions are used from GA according to Property filters.

Sessions are totally unfiltered.

 

Was this article helpful?
0 out of 0 found this helpful
Have more questions? Submit a request

0 Comments

Please sign in to leave a comment.