Normalising a dataset

After you import a dataset to the InfoSum Platform, each dataset must be normalised (and then published) before it can be referenced in queries.

We call this process data normalisation. During data normalisation, your original imported data is mapped onto our Global Schema. This addresses the obvious problem that two separate imported datasets are likely to use different schemas. There are also a range of UI-based tools available to tidy up any messy data. 

This process plays an important role in ensuring the security of your data. During normalisation, direct identifiers are irreversibly converted to anonymised keys. So, even if the Bunker which holds your dataset were somehow compromised, this would not reveal the identity of any individuals.

You will need to perform a series of steps to normalise your data. You can complete these tasks using your Bunker's web-based UI:

If you are importing an activation dataset, you will also need to select an output column to make it available in identity queries. 

For advanced transformations, you can use InfoSum's custom scripting language, the Data Transformation Language (DTL). The language offers a powerful tool to apply complex logic to your transformations. Using DTL, you can apply a range of transformations to your imported dataset, clean up messy data and convert formats to match those defined in the Global Schema.