Difference between two session collection algorithms

We have two way to collect session data: based on GA API and based on OWOX BI algorithm.

Why do we need new OWOX BI algorithm to collect session data?

You face sampling in Google Analytics when you have more than 500 000 sessions per day. The new algorithm allows you to collect sessions based on the raw data without Google Analytics.

When it’s time to move to OWOX BI algorithm?

  1. The number of sessions on a website is close to or more than 200 000 daily.
  2. You need to track the true source of a visit (using Last Click, not Last Non-Direct Click)
  3. You want to use OWOX user_id field in your session data (which is collected on a hit level)

Is there any difference in the data structure?

Yes, sessions based on OWOX BI algorithm:  

  1. Do not contain “totals.*”, since those fields have a total number of hits collected by GA. For these needs, we recommend using totalsStreaming.* fields.
  2. Do not contain visitNumber and newVisits. These fields are present in the tables, however, they are empty. The value of these fields will be available with the next product updates.
  3. Contain the field “isTrueDirect”, this field will help you understand that a visit was direct.
  4. Contain the field "userOwoxId".
  5. Contain fields customDimensions, customMetrics, customGroups, however, they are on the hit level. It’ll be possible to define the scope of custom dimensions in the next update.

 

Differences in the data collection

 

Stage

Based on Google Analytics API

Based on OWOX BI algorithm

Formation of sessions

We upload indicators of the sessionID using Google Analytics Core API. Like traffic source, geo data, device type data. So, the beginning and the termination of sessions is defined by GA logic.


Once we get this session data, we add raw hit data to session tables.

Sessions are computed in GBQ based on totally raw data using OWOX BI algorithm.

Logic of a session beginning and termination are the same as in Google Analytics.

Traffic source determination

Source of a session is generated in accordance with Last Non-Direct click model. It means the source of the last non-direct visit is assigned to every direct visit. There’s no possibility to define whether a visit was a direct one. Every direct visit is associated with a source from the last non-direct click visit.

The source of a session is a true visit source (by Last Click model).

Sessions will be collected by Last Non-direct click logic in the next product update. Vocabulary of traffic sources will be collected on the OWOX BI side.

Definition of utm values for AdWords auto-tagging (gclid)

It’s defined according to GA API which has native integration with AdWords.

You need reports with raw Adwords data in BigQuery. They can be set up with the help of native integration Google Data Transfer.

Table Structure

Tables are divided by days in the timezone of GA Property. Every session looks like a different string with nested fields that contain hits.

Is the same.

Collection of sessions for the previous day

At 5am, since data become available in  Google Analytics Core API at 4 am.

At 1am

Data filtration

Filtered sessions are used from GA according to Property filters.

Sessions are totally unfiltered.

 

Was this article helpful?
0 out of 0 found this helpful
Have more questions? Submit a request

0 Comments

Please sign in to leave a comment.