Data modifications during normalization 2.0
At the normalization stage, you have the option to apply some basic column modifications to ensure that the data you publish to a Bunker is in the most useful format for your intended use case.
Please note that all columns marked as keys will be automatically standardized with two modifications: trim whitespaces and lowercase.
Table of contents
Training video
Column modification options
- Create multi-value: Joins two or more single-value columns into one multi-value column. It does not support multi-value columns as input. We recommend only doing this for columns that hold the same data (e.g. joining two email columns, or columns that contain related attributes such as hobbies)
-
Concatenate: Merges values within columns together along with any other plaintext you want e.g. example is [email + “ and first name “ + firstName]. Both input columns must be single value again. It produces a single value column as a result. You may specify a delimerer, or it will default to space.
You have two options for how to handle null/empty values in a concatenation:- Ignore null values, YES: If any of the cells used in a concatenation is empty, the modified value will return as an empty string for the empty value but it will still concatenate the rest. E.g., name_address, where name is null, will return: '_address'
- Ignore null values, NO (unchecked): If any of the cells used in a concatenation is empty, the modified value will return as an empty string/null
E.g., name_address, where name is null, will return: ''
-
Parse Date/Time Format: You need to parse date/time columns using this modification to include the format of the data, the platform will then be able to interpret the data as date/time including timestamps (if selected). If your data doesn’t have timezone information you will have to specify this at the time of saving the normalization config.
You can specify your own date format by clicking on the pencil icon next to the format, please keep in mind standard date formats that can aid in collaboration. We support any combination of year-month-date and hour-min-sec as long as the format specified matches your data and you respect capitalization rules (MONTH and HOUR are the only two parameters capitalized).
- Integer bucket representations: Make integers and floats more useful to generate insights by grouping them into buckets. You can select the bucket size.
- Find and replace: Change the values in the cells to more descriptive or more useful values. We always recommend having human-readable values in the data to generate better insights.
-
Hash your data: all parties involved in the collaboration must import matching identifiers in the same format (raw or hashed with the same SHA2 hashing algorithm). We always recommend uploading raw identifiers where possible to avoid any pre-formatting issues and you must import raw identifiers if you are matching to the Global Schema (excl email which can be imported hashed). However, if one party can only import pre-hashed matching identifiers - the other parties must hash their data to the same standard and set the identifier as a custom key. There are three options:
- Hash before importing: You can hash your data before importing it to InfoSum, you won’t need to do anything else at this stage.
- Conditional hash modifier: If you are unclear if your data is hashed or not, but needs to be hashed, then please use this modifier which will check if your data is raw before hashing it. It won’t hashed already hashed data to SHA2.
- SHA2 modifier: If you are sure your data is raw you can simply apply this modifier to hash your data to the agreed SHA2 standard.
- You can also manually trim whitespaces and lowercase any column. This is automatically applied to all data that is marked as keys (Global schema or custom)
How to modify columns
There are two ways to start your modifications:
- Click on the big purple + icon at the top of the normalization table
- Click on the + icon on the right-hand side to select Add a modification (and then select which columns)
This opens a new modal window where you can select available modification features.
The Modifier Function element provides the available options, and you can update the New Column by giving it a new recognizable name. This can also be amended afterwards in the table.
Next Steps
Please continue the normalization process