Normalising a dataset
After you import a dataset to the InfoSum Platform, each dataset must be normalised (and then published) before it can be referenced in queries.
We call this process data normalisation. During data normalisation, your original imported data is mapped onto our Global Schema. This addresses the obvious problem that two separate imported datasets are likely to use different schemas. There are also a range of UI-based tools available to tidy up any messy data.
This process plays an important role in ensuring the security of your data. During normalisation, direct identifiers are irreversibly converted to anonymised keys. So, even if the bunker which holds your dataset were somehow compromised, this would not reveal the identity of any individuals. For details of how imported data is mapped to the Global Schema on InfoSum Platform, including any formatting required to your raw data, see normalisation rules.
You will need to perform a series of steps to normalise your data. You can complete these tasks using your Bunker's web-based UI:
- assign columns to categories,
- set up category mappings,
- use the transformation tools,
- test with a dry run,
- normalise and publish
For advanced transformations, you can use InfoSum's custom scripting language, the Data Transformation Language (DTL). The language offers a powerful tool to apply complex logic to your transformations. Using DTL, you can apply a range of transformations to your imported dataset, clean up messy data and convert formats to match those defined in the Global Schema.