Matching and Counting Methodologies

Table of contents

Matching overview

How it affects your match rates/volumes

Advanced settings

Key override

Count on best key

Match on one key, count on another

Matching overview

This is a very high-level summary of our matching methodology - it can be used to understand and optimize match rates using platform settings.

Minimum criteria for matching:

You must have at least one common key. To ensure and maximize match rates, keys must:

Have the same column name following Normalization:
- This can be achieved by mapping to a Global Schema Key, which will ensure the same formatting is applied to both.
- Or coordinating between partners to create a Custom Key; ensure the same formatting is applied, e.g., lowercasing, hashing.

How we match:

By default, the platform uses the key that produces the highest match rate for the Intersection.
Due to the decentralized nature of our platform, query processing happens on the edge of the Bunker (also called Dataset in the platform). An anonymized, mathematical representation of the Dataset is then matched against the other Datasets involved in the query.

How we count:

We match on keys, but count on rows.
- This means that a matched multi-value key will only be counted once, even if multiple keys in the row are found to match, and
- If a key is found in multiple rows, it will only be counted once.

How multi-value keys affect your match rates/volumes

When introducing multi-value keys in Datasets, the order of Datasets can impact the count of an Intersection.

To illustrate this, let’s imagine a simple example, where you are trying to understand the overlap between users from Dataset A and Dataset B:

Dataset A contains multi-value keys; specifically, the emails “a@infosum.com”, “b@infosum.com”, and “c@infosum.com” are associated with the same user (= 1 row).
Meanwhile Dataset B only contains single-value keys; specifically, the emails “a@infosum.com”, “b@infosum.com”, and “c@infosum.com” are therefore associated with 3 separate unique users (= 3 rows).

When we perform the intersection query “A intersect B”, we assess the count of A users that are also present in B. Although 3 different email values in Dataset A resulted in a match, all 3 of these are associated with the same unique user (= 1 row) in Dataset A, therefore the count of this intersection query returned is 1.

On the other hand, when we perform the intersection query “B intersect A”, we assess the count of B users that are also present in A. 3 different email values in Dataset B resulted in a match, and all 3 of these are associated with different unique users (= 3 rows) in Dataset B, therefore the count of this intersection query returned is 3.

Dataset A	Dataset B
a@infosum.com / b@infosum.com / c@infosum.com	a@infosum.com
	b@infosum.com
	c@infosum.com
Total = 1

Dataset B	Dataset A
a@infosum.com	a@infosum.com / b@infosum.com / c@infosum.com
b@infosum.com
c@infosum.com
Total = 3

Advanced settings

Key override

Using Key Override, you can change the Platform’s automatic choice of best match key.

You might have a pair of Datasets with two or more different keys. By default, the platform uses the key that produces the highest match rate for the Intersection. Using Key Override, you can tell the Platform to use a different key for the count.

If you are using the Audience Builder, you can set a Key Override using the Key Override row at the bottom of the screen.

If you are using the Query Tool in the Platform, you can set a Key Override in the Advanced Options. Please note that, when you use the Advanced Options in the Query Tool, the query will have a source Dataset (the one on the right of ‘INTERSECT’) and a destination Dataset (the one on the left of ‘INTERSECT’), where the count takes place. When you use Key Override, make sure you correctly add the source Dataset into the Source field and the destination Dataset into the Destination field. You can always run the query both ways to make sure the result is the one you’re after.

Also note that, when performing activation queries, the Key Override source Dataset (right of ‘INTERSECT’) must be the consulted Dataset and the destination Dataset (left of ‘INTERSECT’) must be the Dataset from which data will be exported for any given key.

Count starting with best key

Match across multiple keys in quality order.

If you want to count on multiple keys to potentially increase your reach, get in touch with your CS representative to enable a feature called “count on best key” that allows the platform to count on the best keys in cascading order: it will start with the strongest key (ex: email), then move onto the next best key (ex: phone number), then the next (ex: UDPRN). This way, you’ll receive a cumulative match across multiple keys.

Your CS representative will share a legal boilerplate for you to approve so we can enable the feature.

Match on one key, count on another

Understand how many output IDs are available for your matched keys.

If you want to see how many output IDs are available for your matched keys, please get in touch with your CS representative to enable a feature through which you can match on one key (ex: email) and count on another (ex: device ID) to preview how many IDs could be pushed out of the platform from the Intersection.

This is particularly useful when the match key and the activation key are not the same. For example, your match key might be email, but your activation key is an internal ID. This will give you a view of the size of your audience for activation.

Your CS representative will share a legal boilerplate for you to approve so we can enable the feature.

Hi, How can we help?

Matching and Counting Methodologies

Matching overview

How multi-value keys affect your match rates/volumes

Advanced settings

Key override

Count starting with best key

Match on one key, count on another

Was this article helpful?