Normalization rules
Normalization rules describe the steps applied to all imported data when it is mapped to the Global Schema on InfoSum Platform.
When you import your raw data to the Platform, we require certain data to be in a particular format so that it can be correctly validated and mapped to Global Schema categories. Categories can then form keys or attributes.
Our Bunker normalization technology has been purposely designed for users to feel comfortable with bunkering raw data, including emails. Our email normalization process begins by converting raw data to sha256 before it is further encrypted, salted and mathematically sketched. While we do accept Sha256 encoded emails from users who do not have access to raw data, match rates are optimal when users bunker raw emails. By the end of the normalization process, there is no translatable identifier information stored within an InfoSum Bunker. We recommend bunkering raw format identifier data where at all possible.
All keys are normalized and hashed. However, some keys (such as email, date of birth, UK and US addresses, and phone) are subjected to additional steps before they are converted to hashed values.
The following sections describe:
- required formatting customers need to use when inputting certain raw data.
- how InfoSum validates this data.
- additional steps InfoSum applies to normalize this data.
Email Address
What email address format does InfoSum require? |
How does InfoSum validate emails? |
How does InfoSum normalize emails? |
Can be provided in either SHA256 hexadecimal format or raw data. Must be in a single column. If you use SHA256 format, ensure all email addresses are in lowercase with leading/trailing white spaces removed before you convert to SHA256 format. |
If email is in raw format: Matches RFC 2822. If email is in SHA256 format: Checks hash has correct length |
If raw data is not in SHA256 format, the email address will be converted to lowercase characters (will remove leading/trailing white spaces) before hashing. |
Name
What name format does InfoSum require? |
How does InfoSum validate names? |
How does InfoSum normalize names? |
Always provided in raw format. Upload either a single column as input or two columns. Requires first and second name. Note: We do not support middle names or suffixes. However, these could be imported as custom categories if you want to use them outside of our Global Schema |
Always lowercase. First name characters up to the first space are used. Blank space is removed. |
Single column entries in your spreadsheet are mapped to a single category in the Global Schema. Multi-column entries are also mapped to a single category, but that category allows additional mapping by assigning a sub-type (called a property) to each column in the multi-column category. See example below. Leading/trailing white spaces removed from each column and converted to lowercase characters. |
You can map the name columns in your spreadsheet to the ”Forename” and “Surname” properties of the “Name” category in the Global Schema, as shown:
The “Name” category cannot be used as a key in its own right and must be used in combination with a second category to form a key. For example, to use “Name” in a key it must be paired with another category such as “Date of Birth” or “UDPRN” as shown:
Before you normalize your data, you can configure category properties to help your Bunker understand your original schema. For more details, see assigning columns to categories.
DOB (Date of Birth)
What DOB format does InfoSum require? |
How does InfoSum validate DOB? |
How does InfoSum normalize DOB? |
Always provided in raw format. Must be in three columns, each with a separate input value for "yyyy", "mm", and "dd". If your DOB is in a single column (for example, YYYY-MM-DD, DD-MM-YYYY), you can use transformations to convert them into three columns |
Year is a valid integer and not empty. Month is empty or an integer. Day is empty or an integer. |
No extra hashing is done. Represents DOB as a 64-bit integer value. |
You can map the DOB columns in your spreadsheet to the "Day", “Month” and ”Year" properties in the “Date of Birth” category in the Global Schema using the steps below.
Create three DOB columns in your spreadsheet, each with a separate input value for "dd", "mm", and "yyyy", for example:
Day of Birth |
Month of Birth |
Year of Birth |
3 |
4 |
1991 |
23 |
11 |
1979 |
1 |
9 |
1926 |
13 |
5 |
2000 |
Import your data without making any changes to DOB columns. The DOB columns appear as not assigned at normalization, as shown below.
Next, assign the DOB columns to a category as shown. For the steps to do this, see assign columns to categories.
Click Next and select the Date of Birth category from the drop-down list. Next, select the category properties.
Click Save to map your Date of Birth columns to the Global Schema.
When you normalize the data, the imported DOB data is converted to the Global Schema.
UK Address Mapper
What UK address format does InfoSum require? |
How does InfoSum validate UK Addresses? |
How does InfoSum normalize UK Addresses? |
Always provided in raw format. |
Verifies that the address is valid and complete. |
Address generates a unique UDPRN value. UDPRN is the only key for UK Addresses. |
For the steps to map your UK addresses, see UK address mapping.
The street address mapping you can access is based on your InfoSum plan. This is because not all InfoSum Platform features come as standard - please contact sales@infosum.com for details.
Phone
What phone number format does InfoSum require? |
How does InfoSum validate phone numbers? |
How does InfoSum normalize phone numbers? |
Always provided in raw format. Valid phone number. |
Checks that the phone number is in valid E.164 format. |
If a phone number has no international number at the beginning or region not selected during mapping, the default chosen international number will be attached. |
US Address Mapper
What US address format does InfoSum require? |
How does InfoSum validate US addresses? |
How does InfoSum normalize US addresses? |
Always provided in raw format. Requires values in either: street name, city and state fields. Supported Zip formats: Zip9 is accepted in two formats: 01740-1329 or 017401329 Street name is required along with Zip code. The Platform cannot map or normalize Zip code on it's own without a street. Both Street name and Zip code are required to validate an address. City and State are not mandatory values. If an address has a Zip5, InfoSum Platform creates a Zip9 key. |
Verifies that the address is valid and complete. |
Address line 1, 2, town and state is hashed to generate three keys: USA address, Zip9 and Zip5. |
For the steps to map your US addresses, see US address mapping.
The street address mapping you can access is based on your InfoSum plan. This is because not all InfoSum Platform features come as standard - please contact sales@infosum.com for details.