How the ML Funnel Based Attribution model calculation works

The ML Funnel Based Attribution model from OWOX BI evaluates the efficiency of your advertising campaigns by measuring how each ad medium contributes to your customer’s path to conversion. The model is calculated using the Markov chain and machine learning algorithms.

A Markov chain is a sequence of events where each next event depends on only one of the previous events. Attribution based on Markov chains is a probabilistic model that calculates the probabilities of how a user moves between the steps of the conversion funnel and thus lets you learn how the steps in the funnel affect each other and which of them are the most and the least valuable to the conversion.

Read the breakdown of the popular attribution models and the major principles of how attribution is calculated using the Markov chains in our blog post.

In this article, we’ll describe the complete process of how an ML Funnel Based Attribution model is calculated.

What data can you use for building an attribution model

To calculate an ML Funnel Based Attribution in OWOX BI, you need data uploaded to Google BigQuery and a Google account with the BigQuery Admin to all the data you want to include in your attribution model.

You can build your model based on up to three data sources:

  1. Website user behavior data. This can be data collected by OWOX BI with the Google Analytics → Google BigQuery pipeline or exported from Google Analytics 360
  2. Data about online and offline transactions from your CRM system is uploaded to Google BigQuery. How to upload data from CRM to BigQuery.
  3. Custom events: any other events uploaded to a BigQuery data table manually. More about the attribution of custom events.

Read more about how to prepare data for an attribution model calculation in this article.

The calculation happens in four stages:

  1. Collecting events from the connected data sources
  2. Calculating the probabilities of transitions between the conversion funnel steps
  3. Gathering the conversions data
  4. Calculating the events’ value

Next in the article — the details of each of these stages.

Stage 1. Collecting the events

To calculate the ML Funnel Based Attribution model, OWOX BI collects from the data sources only those events that can get a value when the user’s path ends with a conversion. Examples of such events are visits, product page views, and cart views.

Now, we’ll delve into some details on how these events are collected by OWOX BI.

How OWOX BI generates session IDs for CRM data and data from the custom events tables?

Custom events: based on the fields user_id, client_id, utm_source, utm_medium, utm_campaign, geo_region, and on the date from the time field.

CRM data: based on the fields user_id, transaction_id, transaction_created.
How is the data handled when different connected sources have the same transactions in them or the transaction data doesn’t match?

Google Analytics + CRM. If both of the sources have overlapping data on certain transactions, the transactions’ value will be distributed based on what transaction data is present in each of the tables.

Here, one of the three cases is possible:

1) The transactions are present in Google Analytics data, but they’re missing from the CRM data:

  • The value of the products from these transactions won’t be distributed. This case implies that the presence of transaction data must verify the transaction actually happened, since if it hadn’t gotten to a CRM, then it must have been canceled or refunded after Google Analytics registered the transaction online.

2) The transactions are missing from Google Analytics data while they’re present in the CRM data:

  • These transactions count as offline purchases and will get own sessions with the medium set as 'offline'.
  • If an offline customer had online sessions before the purchase and OWOX BI has enough data to connect an online session with an offline purchase, then these sessions will have value distributed. The value of all the funnel steps between the online and offline interactions will be assigned to the offline-purchase session.
  • If there was no online session before the offline purchase, then 100% of the value will be assigned to the offline-purchase session.

3) The transactions are present in both the Google Analytics and CRM data tables:

  • All the info about such transactions and the sessions that led to them will match the Google Analytics data.

Google Analytics + custom events. The aim of adding custom events to your attribution model is to augment your online conversion data with events that can’t be tracked by Google Analytics. These are events like call-center calls, sent emails, ad views, and more. Since these are additional data, in case they overlap with the GA data, the priority in the attribution will be given to custom events.

CRM + custom events. In this case, the priority is also given to custom events that this time augments the CRM data. But only if the CRM table has the overlapped transaction in the “completed” status. Otherwise, the value won’t be distributed.

More about the attribution of custom events

Google Analytics + CRM + custom events. In the case when all possible sources are connected to the attribution model, the value is being attributed with this priority: 1) Custom events, 2) Google Analytics, and 3) CRM.

More about handling multi-source transactions

How does OWOX BI connect the sessions from different devices and browsers?

The sessions are being connected by matching the User ID parameters for the logged-in user actions and the Client IDs for non-logged-in ones.

OWOX BI connects actions into chains primarily by User ID in case it’s known, i.e. the user was logged in on each device and browser while performing these actions. If some of the actions were made by this user while not logged in, but the user was logged in on the same device before (we check it by matching the sessions’ Client IDs), then OWOX BI fits all these actions into one chain of sessions across devices and browsers:All_sessions_connected_en.png

If there are several User IDs per one Client ID, then all the actions will still be gathered in one chain but all sessions will be associated with the User ID that goes last in the chain:All_sessions_connected_different_userId_en.png

In each data source, Client ID values are stored under different field names:

Data source Field name in a BigQuery table
'owoxbi_sessions' and 'session_streaming' tables collected via OWOX BI The clientId field
BigQuery Export for GA 360 tables The fullVisitorId field as a session_client_id, unless there is a Custom Dimension being collected
Custom events and CRM data The client_id field

 

What user IDs are NOT considered in the model calculation?

User IDs (and combinations of User IDs and Client IDs as described above) that appear in more than 100 sessions per day are ignored since it’s the unnatural activity for real users. All events associated with these IDs still take part in the model calculation but the users are considered as non-logged-in.

Also OWOX BI considers as non-logged-in all the users that have User ID values as 'undefined', '0' or empty. Thanks to this, you won’t get in your attribution model false users connected by such User IDs.

What sessions are NOT considered in the model calculation?

If in the model settings you set sources to exclude from the model calculation, then the value of an excluded source will be transferred to the source that comes before the excluded one in the chain.

This works similar to the Last Non-Direct Click attribution model, but with a twist: you can exclude not only the direct traffic but any source you like.

The excluded sources will get value only in the cases when there were no prior sessions in the chain with other sources or the previous source also was excluded.

Read more details and examples of how it works in this article.

What data is needed to calculate the probabilities of transitions between the conversion funnel steps?

The probability calculations are based on these user properties: user type (new or returning), device category, and region.

User type is defined based on the user’s transactions for the last 365 days. Users in the sessions before the first conversion within this period will get the user type 'New'. In the sessions after the first conversion, the user type will be 'Returning'.

Device category: all the sessions with the identical user ID and device get a corresponding device category: 'mobile', 'tablet', or 'desktop'. If one user ID has several devices associated with it, then the device category will be designated as ‘cross’. This is also based on sessions for the last 365 days.

Region: all the user’s sessions get associated with the region that appears first in the sessions for the last 365 days. If the data source for the model calculation is Google Analytics, then the region is the same as tracked by GA. For the custom events and CRM data, the region is specified in the transaction_region field.

Stage 2. Calculating the probabilities of transitions between the funnel steps

To calculate the probabilities, OWOX BI Attribution turns all the conversion funnel steps specified in the model settings plus the site visit step into the states of a Markov chain.

After that, OWOX BI calculates transition probabilities between these states:image1.png

In the picture, you can see a simplified example of more visual representation. In real cases, there can be significantly more transitions, up to a full graph.

The calculation is based on a group of user properties including user type (new or returning), device category, and region.

The probabilities can be checked on four levels:

  1. All three properties: user type, device, region
  2. Two properties: user type, device
  3. One property: device
  4. Global probabilities

First, OWOX BI checks probabilities for all three properties. If the special formula (see next paragraph) considers the data on this level reliable, then this probability is used as the basis for the attribution model.

The probability is considered reliable if the reliability interval calculated with the formula

1.645*sqrt((a_actions-b_actions)/(a*b))

has a value less than 0.1.

If the interval is more than 0.1, OWOX BI checks the next probability level, up to global. The global probability (transition probability between all the funnel steps) is used only if there is no reliable data on all the previous levels.

Stage 3. Collecting conversions

Within one conversion funnel, all identical events get a single total value for all of them, regardless of how many of them are in the funnel.

For example, there are similar chains, yet one has got 30 product page views while the other only 5. Both chains of events get the same value. The former gets it for all the 30 views, the latter for 5 views. As a result, each individual event will get value/30 in the first case and value/5 in the second case.

This is the wrong approach.

Thus, to avoid such situations and calculate the value correctly, OWOX BI picks only one representative event.

OWOX BI connects each conversion and each event in the funnel to the ID of the session within which the first of the events happened. As a result, during the model calculation, OWOX BI will calculate the probability for each unique event in the funnel.

Stage 4. Calculating the events’ value

To calculate the value, the OWOX BI attribution model distributes the score between each event considering its probability. The score is also being given to each of the user properties.

Example:

  • Event B happened after Event A.
  • The user, while triggering these two events, has the device category defined as 'cross', their region is 'Home'.
  • This combination of device category and region wasn’t associated with this user’s ID within the last 365 days. Hence, the user’s type is 'New'.

OWOX BI calculates the value using the probability of all these parameters:

S(b) = 1 - P(a, b, 'cross', 'Home', 'New')

The transaction value is being distributed in the same way as scores are being distributed between events.

If there are several identical events associated with the same transaction, the transaction value is being distributed using the time-decay method: the closer the event to the transaction is — the more value it receives. To distribute value this way, it would be enough to know the value of the earliest of these identical events and then attribute it to the rest of them.

Was this article helpful?
1 out of 1 found this helpful
Have more questions? Submit a request

0 Comments

Please sign in to leave a comment.