Key fill rates

The key fill rate is a measure of how successfully the individuals represented in your dataset have been converted to keys.

Key fill rates are one of the most important metrics in judging data quality, and how successfully two datasets can be used together in a query. Your key fill rate will be less than 100% if your original data is ambiguous, inconsistent or incomplete.

For example, your original dataset may contain customers' email addresses. Because an email address is a direct identifier, it can be used to make a key. But suppose that 25% of your customers did not give an email address, and a further 5% typed something obviously invalid. The key fill rate for that key will be 70%.

This means that only 70% of the rows in your dataset can be accessed by a query using that key. The remaining 30% are inaccessible to a query using that key, as though they had never been present in the dataset at all.

Each dataset may have up to five keys, and each key will have its own fill rate. You can check the key fill rates using the Datasets tab in InfoSum Platform. Click on one of the rows to bring up the information below. Hover your mouse over a key, to see information on the fill rate of each key.

When you switch to the Connections List tab, you can also view information on the keys held in each pair of datasets alongside the fill rates. 

Here, you can also see the keys that appear in both datasets, the intersection, and the number of times a key is duplicated, i.e. the same email appearing in two rows. Hover your mouse over a key, to see information on the intersection and fill rate of each key.