Supports multiple values within a key
This feature will give clients the ability to upload data in a single row with multiple values for a single key
How does this work?
Let’s take a scenario of a customer importing an identity data. In some circumstances, a user can be identified by multiple email addresses or multiple cookies.
This is how a customer imports data without Multi-value key feature:
Internal ID |
First Name |
|
Cookie |
Mobile Advertising ID |
1 |
John |
#email1 |
cookie1 |
maid1 |
1 |
John |
#email2 |
cookie2 |
maid1 |
2 |
Dave |
#email3 |
cookie3 |
maid2 |
2 |
Dave |
#email4 |
cookie4 |
maid2 |
3 |
Jamie |
#email5 |
cookie5 |
maid3 |
3 |
Jamie |
#email6 |
cookie6 |
maid3 |
4 |
Katy |
#email7 |
cookie7 |
maid4 |
Here is how the data is transformed by the Multi-value key feature:
Internal ID |
First Name |
|
Cookie |
Mobile Advertising ID |
1 |
John |
#email1,#email2 |
cookie1,cookie2 |
maid1 |
2 |
Dave |
#email3,#email4 |
cookie3,cookie4 |
maid2 |
3 |
Jamie |
#email5,#email6 |
cookie5,cookie6 |
maid3 |
4 |
Katy |
#email7 |
cookie7 |
maid4 |
With this feature, the Platform can accept a file with multiple identifiers per individual in a single row. In the above example, the customer can import a file with 4 rows instead of 7 rows.
The primary benefit is that you can have more rows in your file if any of your keys has multiple values. The multi-value keys produce the same aggregation output as a single value. Every value in a multi-value key is treated independently when used in matching.
Implementation
Upload:
In the first version, InfoSum will only accept the data already in the form of an array/list type for CSV files uploaded or transferred via SFTP/S3/GCP. The maximum number of identifiers per key can be 25. We do have a character limit of 128 per identifier
That means you will need to merge your rows prior to upload into the InfoSum platform. In later versions, we can support merging the rows inside the InfoSum platform.
Merging will be achieved using dual delimiters in the file. One to split the columns and the other to split the entries in a list within a column.
In the below example, we are uploading a file containing multi-value key columns (Email & Vehicle Registration Number). The data is already in the form of an array/list type.
In preview settings, the Platform will show the delimiter used in the multi-value column. If it’s not the correct delimiter, select the right delimiter in the dropdown list.
Click on the toggle next to the column header and enable a multi-value column.
Repeat the same process for all your multi-value columns. You can perform some other optional minor manipulations to the source data here, please see this article for information.
Click “Accept Preview Config” when you are ready to normalise your data, before publishing it.
Normalisation:
There is no difference in normalising your data for Multi-value columns or single value columns.
Matching:
Matching happens in the same way as before. Every value in a multi-value key is treated independently when used in matching. Matching happens between each individual identifier in a multi-value column against a single value column or each individual identifier in a multi-value column.
The Platform shows the matched audience total at each individual (row) level rather than at the identifier level.
Let’s take some scenarios.
Scenario 1:
We are matching between two datasets and both of them contain a multi-value column (Email) and we are using Email as the key to match these datasets.
Dataset A
Internal ID |
First Name |
|
1 |
John |
#email1,#email2 |
2 |
Dave |
#email3,#email4 |
3 |
Jamie |
#email5,#email6 |
4 |
Katy |
#email7 |
Dataset B
Internal ID |
First Name |
|
1 |
John |
#email1,#email2 |
2 |
Dave |
#email3,#email0 |
3 |
Jamie |
#email5,#email4 |
4 |
Lauren |
#email8 |
When the Platform matches Dataset A and Dataset B, it will report the matched audience total as 3 (total no. of rows matched), not the identifiers matched. The Platform reports the total number of the combined audience on an individual level rather than the identifier level because an individual can be represented by multiple identifiers.
For example, John can be represented in two emails (#email1,#email2) but the Platform reports this as one match because those two emails belong to one individual.
Scenario 2:
We are matching between two datasets and only one of them has a multi-value column. Both datasets have Email but Dataset A has a multi-value column and Dataset B has single value column
Dataset A
Internal ID |
First Name |
|
1 |
John |
#email1,#email2 |
2 |
Dave |
#email3,#email4 |
3 |
Jamie |
#email5,#email6 |
4 |
Katy |
#email7 |
Dataset B
Internal ID |
First Name |
|
1 |
John |
#email1 |
1 |
John |
#email2 |
2 |
Dave |
#email3 |
2 |
Dave |
#email0 |
3 |
Jamie |
#email5 |
3 |
Jamie |
#email4 |
4 |
Lauren |
#email8 |
When the platform matches Dataset A and Dataset B, it will still report the matched audience total as 3 because joining happens using Email Key and the counting using the Internal ID Key. The Platform reports the total number of the combined audience on an individual level rather than an identifier level because an individual can be represented by multiple identifiers.
Activation:
Currently, the InfoSum Platform does not support a multi-column key as an output column. You can choose only single value columns as an output but can match using multi-value columns. In the above example, a match can happen using Email (MV column) but you can activate a single value column such as an Internal ID or any other single value key.