Normalising your data
Depending on the cleanliness of the imported data, some or all of the columns may have been assigned to a global schema category. InfoSum’s global schema defines a standard set of categories and keys, which can be used to compare datasets from diverse original sources. This addresses the obvious problem that two separate datasets are likely to use different schemas.
Using the Normalise tab in the Bunker, you can explain the meaning of any columns that weren’t processed during the import and use mapping and transformation tools to clean up any inconsistencies. Any unassigned columns can be found by scrolling right.
To assign a column, click on the Settings button next to the column name, then Assign Category. From here, you can search for a relevant category or scroll through the list, then Assign. If there isn’t a relevant category available, you can create a custom category.
Once all the columns are mapped to a category, error warnings will appear if a data point is not as expected. For example, if the category is income, the Platform will be expecting an integer, not a string. If the income data points were to contain a pound sign, a red flag would appear and the transformation tools. These tools can configure a series of changes, such as “remove the £” or “change this word into that one”.
For more complex tasks, transformation scripts can be written using InfoSum’s Data Transformation Language (DTL).
At any time you can use a dry run to test how successfully your data has been mapped to the global schema against a representative sample. It may help you spot and resolve any further problems with the data and should help you improve the quality. Once you’re happy with the results, select Normalise and navigate to the Bunker dashboard to publish the dataset. It will then be available to reference in queries.
Now you've imported, normalised and published a dataset, it is available to reference in queries.