How to normalize data
Once you’ve created your recordset, it is time to ready your data for publishing. We call this process data normalization. During data normalization, your original imported data is mapped onto our Global Schema which ensures that all parties using the InfoSum platform have data formatted to the same schema/format.
Table of contents
What happens during normalization?
Normalizing an Insights or Activation Bunker
What happens during normalization?
Normalization plays an important role in ensuring the security of your data. During normalization, direct identifiers (such as emails) are irreversibly converted to pseudonymized, salted keys. Before encrypting the data, identifiers also go through a standardization process where any leading or trailing spaces are removed and all text is lowercased - which ensures that two of the same IDs can still be matched after encryption.
For details of how imported data is mapped to the Global Schema on InfoSum Platform, including any formatting required to your raw data, see data formatting for normalization.
Using Custom Keys |
When normalizing your data, the names of the keys used for joining to collaboration partner's datasets need to be consistent across the different datasets. The purpose of the Global Schema is to assist in the standardization of names to make joining simpler. However, if you're uploading a custom key, you will need to ensure the names are exactly the same across all datasets. |
Normalizing an Insights or Activation Bunker
You will use the same UI to normalize both types of Bunkers.
If you want to normalize data for an Activation Bunker, you will need to select at least one output column by toggling the ‘output col’ column during normalization (second to last). An output column is retained in its original form during normalization to allow the export of the results of an activation query. For example, the output column might contain a customer number or an email address that will be used during activation.
If you set any column as an output column, the platform will not allow you to publish categories to an Insight Bunker. Please only select output columns when publishing to an Activation Bunker.
If you don’t select any output columns, the normalized file can be published to an Insight Bunker.
Steps to normalize data
1. Select the recordset you want to normalize
To begin the normalization process, go to your Cloud Vault and select the recordset you want to normalize.
On the right hand side of the screen, a details panel will appear. Click the Normalise button.
2. Configure your normalization settings
The platform will now ask you if you’d like to reuse an existing configuration or if you’d like to create a new one.
If you’re normalizing an updated version of a file you’ve previously imported, then using an existing configuration can be a really fast way to get your data normalized. Simply select from the dropdown the name of the saved configuration and give the output a name that is relevant to this normalization task.
However, if you’re bringing in new data or are unsure, click the Create new Config button and then Continue to column selection.
There are three steps to creating a brand new normalization configuration: selecting the data you wanyt to normalize, mapping to global schema and identifying keys, and saving your normalization configuration.
2.1 Select the columns you want to normalize
On the first page, you’re asked which columns you want to normalize.
Note, that you can’t add columns later on but you can select which columns you wish to publish when publishing your data so it is always better to select more at this stage if you’re unsure. If you’re 100% sure which columns you need then select only these (as having more columns will mean a longer normalization task time).
The platform will automatically select all columns from the recordset for you but you can deselect the Use all columns for selection toggle and manually select the columns you wish. The columns you’ve selected will be displayed in the right hand box. Once you’re happy with the selection, click Continue to mapping.
2.2 Map your data to the Global Schema and assign keys
For more information on the normalization process, the global schema and how to best format your data please read our data formatting for normalization support article.
Using the Global schema
On the mappings screen, the platform will automatically assign columns to the Global Schema where it recognizes a column name. For any missed or incorrect mappings, you can click the gray/ blue pencil icon and correct the assignment. For some columns, there may be a requirement for an additional mapping (eg, postcode in the UK).
We support special address keys in the UK and the US.
Assigning PII as a Key
To set a column as either a key or a category, select the toggle as appropriate under the heading Key. Finally, set the data type of the column as either string or integer.
Note for Activation Bunkers |
If you’d like to use this dataset for an Activation Bunker you’ll need to set the output columns using the toggle under the heading Output columns. Remember a key is used for joining but a column needs to be set as an output column if it needs to be exported from the platform. If you set any column as an output column, the platform will not allow you to publish categories to an Insight Bunker. Please only select output columns when publishing to an Activation Bunker. |
3. Save your Normalization config for future use
The final screen will see the platform ask you to name the configuration you’ve just created (which will allow you to skip these three steps next time and automate your data onboarding) and declare an output name which can be viewed on the next screen.
Once you click Create new config and Normalize, the platform will take you to the Tasks screen where you can view the progress of the normalization process.
Next step
Once you've normalized your data in the platform, the next step is to Prepare and Publish your Data