The first step in normalising your data is to apply categories. When you categorise your data, you tell your Bunker what each column in your dataset means.
When we talk about a column, we mean a field in your original dataset.
When we talk about a category, we mean a data classification in InfoSum's Global Schema. Categories are pre-defined by InfoSum's data scientists to reflect common concepts. For example,
Address are among the available categories.
Categories have many capabilities that ordinary database columns don't. In particular, a single category can have multiple representations, presenting the same data in different ways. In the process of matching up your data with your collaborator's, InfoSum Platform can choose from these multiple representations, selecting the one which gives the best match.
For much more detail on this topic, see our data processing concepts.
Finding columns which aren't assigned to categories
Ideally, once you've finished configuring your Bunker, every column in your original dataset will be assigned to a category.
If your columns have common descriptive names - things like
Postcode - then your Bunker will categorise them automatically. Columns that have already been categorised appear first, towards the left of the Bunker window. If you're happy with the categories your Bunker has applied, you don't need to do any more.
Scroll right to find any columns that haven't been categorised. They'll be shaded light blue.
In the example below, the
Gender column is already categorised, but the
Employment columns aren't assigned to a category yet.
Assigning a column to a category
To categorise a column, click the settings icon, and select Assign Category from the drop-down.
In the Categories dialog (see below), you'll find that section 1 is already pre-filled with the column you've selected. You simply need to fill in section 2, by selecting an appropriate category from the list. Type a few characters to search for categories with a particular name.
Once you click Assign, you'll find your column is assigned to the category you selected, and is no longer shaded blue.
If there isn't a relevant category available, you can create a custom category. A custom category is a category which you have defined yourself (in contrast to one defined in InfoSum's Global Schema). You may use a custom category if you had a column named 'Blue' with 10 values (1-10), for example, or for an internal customer identifier.
When you define a custom category, you give it a name and specify the content type. Two custom categories in different datasets can be matched together if they have the same name. You may need to coordinate with the owners of other datasets to ensure that you all use the same name for the same category or have communicated the naming conventions to run an identity query.
If the column used for the custom category is an identifier, such as a Customer ID, you will need to select 'is key' for it be used later on to match identities across datasets.
Assigning more than one column to a category
Several columns in your original data can map onto a single category. For example, you might have individual columns for
Postcode (or zip code). All of these together would map onto a single category,
To assign more than one column to a category, hold down the SHIFT key and click the settings icon for each column. Once you've clicked on all the columns, select Assign Category from the drop-down.
You'll find that all the columns you selected are now listed in the Categories dialog:
If you missed any columns, you can add them at this stage too, using the Add additional columns drop-down.
Once you're happy with the list of columns, you can Select the category to assign them to, as before.
Categories with properties
For a few categories, you need to configure properties to help your Bunker understand your original schema. For example, the
Address category comes with a
postcode property, which tells your Bunker which of your original data columns contains the postal code.
When you select the
Address category in the Categories dialog, an additional option appears:
Simply select the appropriate column from the drop-down before you click Assign.
Adding more columns to an existing category
If, after you've finished assigning a category, you realise you missed out a column, you can add it later.
To do this, click the settings icon for one of the columns which is already assigned to the category. From the drop-down, select Edit category.
You can now add one or more further columns in section 1 of the dialog. Click Assign to complete the process.
Removing columns from a category
If you assign a category to the wrong column, you can edit or remove the assignment.
To do so, click the settings for the column which is assigned to the incorrect category. From the drop-down, select Edit category, then switch to the existing category tab. In the example below, this category tab is called
As you can see, the column assigned is incorrect, so simply click delete to remove it. You will then be able to assign the
Employment column to another category.