Normalisation rules

Normalisation rules describe the steps applied to all imported data when it is mapped to the Global Schema on InfoSum Platform.

When you import your raw data to the Platform, we require certain data to be in a particular format so that it can be correctly validated and mapped to the correct key or category in the Global Schema.

All keys are normalised and hashed. However, some keys (such as email, date of birth, UK and US addresses, and phone) are subjected to additional steps before they are converted to hashed values.

The following sections describe:

  • required formatting customers need to use when inputting certain raw data.
  • how InfoSum validates this data.
  • additional steps InfoSum applies to normalise this data.

Email Address

What email address format does InfoSum require?

How does InfoSum validate emails? 

How does InfoSum normalise emails?

Must be in a single column. 

Can be provided in either SHA256 hexadecimal format or raw data. 

If you use SHA256 format, ensure all email addresses are in lowercase with leading/trailing white spaces removed before you convert to SHA256 format.

If email is in raw format: Matches RFC 2822.
Case insensitive.

If email is in SHA256 format: Checks hash has correct length

If raw data is not in SHA256 format, the email address will be converted to lowercase characters (will remove leading/trailing white spaces) before hashing.

Name

What name format does InfoSum require?

How does InfoSum validate names? 

How does InfoSum normalise names?

Upload either a single column as input or two columns.

Always provided in raw format.

Requires first and second name.

Note: We do not support middle names or suffixes. However, these could be imported as custom categories if you want to use them outside of our Global Schema

Always lowercase.

First name characters up to the first space are used.

Blank space is removed.

Single column entries in your spreadsheet are mapped to a single category in the Global Schema. Multi-column entries are also mapped to a single category, but that category allows additional mapping by assigning a sub-type (called a property) to each column in the multi-column category. See example below.

Leading/trailing white spaces removed from each column and converted to lowercase characters.

Category properties example

For example, you can assign the properties ”Forename” and “Surname” to the category “Name” in the Global Schema, as shown: “Name” cannot be used as a key on its own, but only in combination with another key to make a unique key. For example, to use “Name” as a key it must be paired with another key such as “Date of Birth” or “UDPRN” (Address) as shown:

Before you normalise your data in a bunker, you can assign multi-column categories to properties, for example:

 

D.O.B. (Date of Birth) 

What DOB format does InfoSum require?

How does InfoSum validate DOB? 

How does InfoSum normalise DOB?

Always provided in raw format.

Must be in three columns, each with a separate input value for "yyyy", "mm", and "dd". 

If your DOB is in a single column (for example, YYYY-MM-DD, DD-MM-YYYY), you can use transformations to convert them into three columns 

Year is a valid integer and not empty.

Month is empty or an integer.

Day is empty or an integer.

No extra hashing is done. 

Represents DOB as a 64-bit integer value.

UK Address Mapper

What UK address format does InfoSum require?

How does InfoSum validate UK Addresses? 

How does InfoSum normalise UK Addresses?

Always provided in raw format.  

Verifies that the address is valid and complete.

Address generates a unique UDPRN value. UDPRN is the only key for UK Addresses.

Phone

What phone number format does InfoSum require?

How does InfoSum validate phone numbers? 

How does InfoSum normalise phone numbers?

Always provided in raw format. 

Valid phone number.

Checks that the phone number is in valid E.164 format.

If a phone number has no international number at the beginning or region not selected during mapping, the default chosen international number will be attached.

US Address Mapper

What US address format does InfoSum require?

How does InfoSum validate US addresses? 

How does InfoSum normalise US addresses?

Always provided in raw format.

Requires values in either: street name, city and state fields
OR
street name and zip code fields

Verifies that the address is valid and complete.

Address line 1, 2, town and state is hashed to generate three keys: USA address, Zip9 and Zip5.