How to normalize data

Once you’ve created your recordset, it is time to prepare your data for publishing. We call this process data normalization. During data normalization, your original columns can be modified and the imported data is mapped onto our Global Schema which ensures that all parties using the InfoSum platform have data formatted to the same schema/format.

Table of contents

What happens during normalization?

Normalizing for activation use cases

Steps to normalize data

Video tutorial

1. Select the recordset you want to normalize

2. Configure your normalization settings

a. Map your data to the Global Schema and assign keys and data types

b. (optional) Modify your columns, and map addresses

c. Remove columns you do not need to normalize

d. Save your Normalization config for future use

Next step

What happens during normalization?

Normalization plays an important role in ensuring the security of your data. During normalization, direct identifiers (such as emails) are irreversibly converted to pseudonymized, salted keys. Our normalization process begins by lowercasing and removing any leading or trailing spaces before converting raw PII to sha256 before it is further encrypted and salted. By the end of the normalization process, there is no translatable identifier information stored within an InfoSum Bunker.

For details of any formatting required for your raw data, see data formatting for normalization.

We recommend bunkering raw format identifier data where at all possible to avoid any discrepancies in standardization across partners.

Normalizing for activation use cases

If you want to normalize data for activation, you will need to select at least one key column that can be exported by toggling the ‘export column’ column during normalization (second to last). An export column is retained in its original form during normalization to allow the export of the results of an activation query. For example, the export column might contain a customer number or an email address that will be used during activation.

If you don’t select any export columns, you won't be able to extract any identifiers from your Bunker. You can still use this in activation use cases where there is another party activating data (and their Bunker has Export Columns)

Steps to normalize data

Video tutorial

1. Select the recordset you want to normalize

To begin the normalization process, go to your Cloud Vault and select the recordset you want to normalize.

On the right hand side of the screen, a details panel will appear. Click the Normalise button.

2. Configure your normalization settings

The platform will now ask you if you’d like to reuse an existing configuration or if you’d like to create a new one.

If you’re normalizing an updated version of a file you’ve previously imported, then using an existing configuration can be a really fast way to get your data normalized. Simply select from the dropdown the name of the saved configuration and give the output a name that is relevant to this normalization task.

If the data that you're bringing is very similar to an existing config, you can click on Modify Configuration and edit an existing configuration to match your new data.

This article explains how to edit an existing configuration and save it as a new config.

If you’re bringing in new data or are unsure, click the Create new Config button and then Continue to column selection.

There are three steps to creating a brand new normalization configuration: mapping to global schema and identifying keys, making any data modifications (optional), and saving your normalization configuration.

Normalizing using JSON

Instead of going through the drop down UI, you can edit the JSON file for that normalization - but we only recommend this option for advanced users. Please reach out to your InfoSum representative for more information.

If our support team is helping you with a complex normalization config they will likely give you a JSON file you can paste on the editor.

1. Map your data to the Global Schema and assign keys and data types

For more information on the normalization process, the global schema and how to best format your data please read our data formatting for normalization support article.

Using the Global schema

On the mappings screen, the platform will automatically assign columns to the Global Schema where it recognizes a column name and mark it as a key. For any missed or incorrect mappings, you can click the gray/ blue pencil icon and correct the assignment. For some columns, there may be a requirement for an additional mapping (eg, postcode in the UK).

You can find a list of the Global Schema keys here

Assigning PII as a Key

A key is an irreversibly encrypted identifier that is used for matching or activation. To set a column as either a key or a category, select the toggle as appropriate under the heading Key.

Please note that all columns marked as keys will be automatically standardized with two modifications: trim whitespaces and lowercase.

Using Custom Keys

When normalizing your data, the names of the keys used for joining to collaboration partner's datasets need to be consistent across the different datasets. The purpose of the Global Schema is to assist in the standardization of names to make joining simpler. However, if you're uploading a custom key, you will need to ensure the names are exactly the same across all datasets.

Note for Activation Bunkers

If you’d like to use this dataset for an Activation Bunker you’ll need to set the output columns using the toggle under the heading Export columns. Remember a key is used for joining but a column needs to be set as an export column if it needs to be exported from the platform.

Assign/confirm data types

Confirm that your data type is registered correctly. If you have mapped keys to the global schema, they will be automatically categorized as the right data type.

String: A combination of letters (with optional numbers or symbols) that will be treated as a word. Most of your attributes will be strings unless they are related to time, currency or other numbers.
- Multi-value string: when a column contains multiple independent string data points on each cell (e.g. two emails in one cell). This will be automatically identified by the platform if the cell contains multiple values.
Integer: a real number without decimal points.
Float (decimal): A real number that has a decimal point

If you are bringing date/time as a format, please use the modifications functionality to parse date/time (see next section)

2. (optional) Modify your columns and use Address mapper (US and UK only)

You also have the option to apply some basic column modifications to ensure that the data you publish to a Bunker is in the most useful format for your intended use case. For example, you might create a new multi-value column from the input of multiple columns or if you’re bringing date/time format, you can use this function to parse it so it can be recognized by the platform.

You will need a date/time column for incremental updates (append data) to your Bunkers. If your data doesn't have one, you can add it by selecting the 'Add datetime column' option. The date can be: a manual static date, or the date of the normalization task (recommended for automation)

Please note that all columns marked as keys will be automatically standardized with two modifications: trim whitespaces and lowercase.

This article lists the modification options and how to apply them.

To use the US address mapper, please follow the instructions on this article.

To use the UK address mapper, please follow the instructions on this article.

3. Remove columns you do not need to normalize

During normalization you might have created new columns and no longer need your original columns. Please remove any columns that aren’t needed by clicking on the bin icon at the end of each line:

You can recover any columns you’ve deleted by clicking on the black + button at the top and selecting ‘re-add input columns’

4. Save your Normalization config for future use

The final screen will see the platform ask you to name the configuration you’ve just created (which will allow you to skip these three steps next time and automate your data onboarding) and declare an output name which can be viewed on the next screen.

If any of your columns has date/time but doesn’t include timezone information, you will need to confirm the timezone for your data at this stage.

If any of your columns are mapped to a phone number global schema key you will need to select the country code at this stage. We can only support one country code override for the whole dataset.

Once you click Create new config and Normalize, the platform will take you to the Tasks screen where you can view the progress of the normalization process.

Next step

Once you've normalized your data in the platform, the next step is to Prepare and Publish your Data

Hi, How can we help?

How to normalize data

What happens during normalization?

Normalizing for activation use cases

Steps to normalize data

Video tutorial

1. Select the recordset you want to normalize

2. Configure your normalization settings

1. Map your data to the Global Schema and assign keys and data types

2. (optional) Modify your columns and use Address mapper (US and UK only)

3. Remove columns you do not need to normalize

4. Save your Normalization config for future use

Next step

Was this article helpful?