Normalizing your data
Depending on the cleanliness of the imported data, some or all of the columns may have been assigned to a Global Schema category.
InfoSum’s Global Schema defines a standard set of categories and keys, which can be used to compare datasets from diverse original sources. This addresses the obvious problem that two separate datasets are likely to use different schemas.
Using the Normalize tab in the Bunker, you can explain the meaning of any columns that weren’t processed during the import by assigning columns to categories. You can then use category mapping and transformation tools to clean up any inconsistencies.
Any unassigned columns can be found by scrolling right. To assign a column, click the Settings button next to the column name, then Assign Category. From here, you can select the columns to assign to a category. Click the NEXT button to search for a relevant category or scroll through the list, then click the SAVE button. If there isn’t a relevant category available, you can create a custom category.
Once all the columns are mapped to a category, error warnings will appear if a data point is not as expected. For example, if the income data points were to contain a pound sign, a red flag would appear. You can then use the transformation tools to change each data point, such as “remove the £” or “change this word into that one”.
For more information, see the section on the process of normalizing data.
At any time you can use a dry run to test how successfully your data has been mapped to the Global Schema against a representative sample. It may help you spot and resolve any further problems with the data and should help you improve the quality. Once you’re happy with the results, select Normalize and head to the Bunker dashboard.
You now need to publish your dataset to make it available to reference in queries.