Matching and Counting Methodologies
Table of contents
How it affects your match rates/volumes
Match on one key, count on another
Matching overview
This is a very high-level summary of our matching methodology - it can be used to understand and optimize match rates using platform settings.
Minimum criteria for matching:
You must have at least one common key. To ensure and maximize match rates, keys must:
-
Have the same column name following Normalization:
- This can be achieved by mapping to a Global Schema Key, which will ensure the same formatting is applied to both.
- Or coordinating between partners to create a Custom Key; ensure the same formatting is applied, e.g., lowercasing, hashing.
How we match:
- By default, the platform uses the key that produces the highest match rate for the Intersection.
- Due to the decentralized nature of our platform, query processing happens on the edge of the Bunker. An anonymized, mathematical representation of the dataset is then matched against the other datasets involved in the query.
How we count:
-
We match on keys, but count on rows.
- This means that a matched multi-value key will only be counted once, even if multiple keys in the row are found to match, and
- If a key is found in multiple rows, it will only be counted once.
How multi-value keys affect your match rates/volumes
When introducing multi-value keys in datasets, the order of datasets can impact the count of an Intersection.
To illustrate this, let’s imagine a simple example, where you are trying to understand the overlap between users from dataset A and dataset B:
- Dataset A contains multi-value keys; specifically, the emails “a@infosum.com”, “b@infosum.com”, and “c@infosum.com” are associated with the same user (= 1 row).
- Meanwhile dataset B only contains single-value keys; specifically, the emails “a@infosum.com”, “b@infosum.com”, and “c@infosum.com” are therefore associated with 3 separate unique users (= 3 rows).
When we perform the intersection query “A intersect B”, we assess the count of A users that are also present in B. Although 3 different email values in dataset A resulted in a match, all 3 of these are associated with the same unique user (= 1 row) in dataset A, therefore the count of this intersection query returned is 1.
On the other hand, when we perform the intersection query “B intersect A”, we assess the count of B users that are also present in A. 3 different email values in dataset B resulted in a match, and all 3 of these are associated with different unique users (= 3 rows) in dataset B, therefore the count of this intersection query returned is 3.
Dataset A |
Dataset B |
a@infosum.com / b@infosum.com / c@infosum.com |
a@infosum.com |
c@infosum.com |
|
Total = 1 |
Dataset B |
Dataset A |
a@infosum.com |
a@infosum.com / b@infosum.com / c@infosum.com |
b@infosum.com |
|
c@infosum.com |
|
Total = 3 |
Advanced settings
Key override
Using Key Override, you can change the Platform’s automatic choice of best match key.
You might have a pair of datasets with two or more different keys. By default, the platform uses the key that produces the highest match rate for the Intersection. Using Key Override, you can tell the Platform to use a different key for the count.
If you are using the Audience Builder, you can set a Key Override using the Key Override row at the bottom of the screen.
If you are using the Query Tool in the Platform, you can set a Key Override in the Advanced Options. Please note that, when you use the Advanced Options in the Query Tool, the query will have a source dataset (the one on the right of ‘INTERSECT’) and a destination dataset (the one on the left of ‘INTERSECT’), where the count takes place. When you use Key Override, make sure you correctly add the source dataset into the Source field and the destination dataset into the Destination field. You can always run the query both ways to make sure the result is the one you’re after.
Also note that, when performing queries between Activation and Insight datasets, the Key Override source dataset (right of ‘INTERSECT’) must be the Insight dataset and the destination dataset (left of ‘INTERSECT’) must be the Activation dataset for any given key.
Count starting with best key
Match across multiple keys in quality order.
If you want to count on multiple keys to potentially increase your reach, get in touch with your CS representative to enable a feature called “count on best key” that allows the platform to count on the best keys in cascading order: it will start with the strongest key (ex: email), then move onto the next best key (ex: phone number), then the next (ex: UDPRN). This way, you’ll receive a cumulative match across multiple keys.
Your CS representative will share a legal boilerplate for you to approve so we can enable the feature.
Match on one key, count on another
Understand how many output IDs are available for your matched keys.
If you want to see how many output IDs are available for your matched keys, please get in touch with your CS representative to enable a feature through which you can match on one key (ex: email) and count on another (ex: device ID) to preview how many IDs could be pushed out of the platform from the Intersection.
This is particularly useful when the match key and the activation key are not the same. For example, your match key might be email, but your activation key is an internal ID. This will give you a view of the size of your audience for activation.
Your CS representative will share a legal boilerplate for you to approve so we can enable the feature.