Using tutorial mode

Using tutorial mode, you can access several test datasets to explore some of the functionality of the Platform and run some pre-built queries, without needing to import any of your own data. These scenarios reflect real life situations that our users commonly encounter, and help to give context on how to use InfoSum. It will also provide a quick tour of the Platform.

Before starting, please make sure you are logged into your InfoSum Platform account.

To enable tutorial mode, navigate to the Dashboard and click on the Add demo permissions button. The tutorial datasets will then appear under the Pending Permissions header. Use the Action button to accept each permission, or use the bulk selection functionality in the Permissions Received tab.

When you switch to the Datasets tab, you will now see the following datasets:

  • A brand called ACME
  • A series of publishers - Auto and Sports, Lifestyle Weekly and News Today
  • A bank
  • An identity graph
  • A third-party data source

From here, you can learn about each dataset and its characteristics without being able to view or access the underlying data. This tab displays general information on the datasets, such as the owner, type, name and status. Each dataset that you own or have permission to use is viewable in this tab.

Dataset general

When you create and own a dataset, you will be able to access the Bunker allocated to that dataset. Each Bunker is a separate and secure location that data is imported into and never leaves. As the tutorial datasets are permissions, you cannot access their Bunkers.

To begin exploring the datasets, click on one of the rows and an additional settings box will appear. The box will open on the Details tab, where you can view and edit an expanded version of the table view.

To see which identifiers are held in the dataset, switch to the Keys tab. Keys are the anonymised information InfoSum uses to identify records in separate datasets which relate to the same individual. A key might be a single piece of data, or a combination of data items which together identify a person.

To use datasets together in a query, they must have at least one key in common. Two datasets with the same PII will generate the same key, so the two datasets can use their keys to identify records which relate to the same person.

Datasets - Keys

To see which attributes are held in the dataset, switch to the Category stats tab. Categories are the name given to attributes in our Global Schema, such as age or marital status. This provides an understanding of the data types in the dataset and informs the types of information that could be compared and analysed.

dataset- category

Next, switch to the Connections tab to start understanding the connectivity between the datasets. The overlap of each pair of datasets that you own and have permission to use is automatically calculated and displayed here.

In the background, the Platform is using the keys in each dataset to estimate how many individuals appear in each pair of datasets - and so displays the size of the intersection.

Datasets - connections

You can explore this further by selecting a row. Another additional settings box will appear, which will open on the Key intersections tab. This tab shows the overlap for each key and gives an idea of how successfully the two datasets can be used together in a query. In the example below, you can see that the Email key provides a larger overlap than the Mobile Phone Number key.

Dataset - connection keys

 

When you hover your mouse over a nominated key, you can see information on the quality of each key. This is one of the most important metrics in judging data quality. When you run a query, it will automatically select the optimal key to match with but, if you want to, you can select a specific key or combination of keys.

Datasets - Keys. fillratejpg

Similarly, when you switch to the Category Fill Rates tab, you can also view the matching categories in each pair of datasets and information on the quality of the data.

dataset - category fillrate

We'll now walk through a range of pre-built queries, based on popular scenarios.

From the main menu on the left side, select the Analyse section and within the sub menu select the  Query Tool. The Query Tool is designed to help you build, test and run queries using our Insight Query Language, IQL. Loosely based on SQL, IQL enables you to join, filter and enrich any number of connected datasets. Please see this article for an introduction to IQL.

You will see the tutorial datasets listed as shown in the image below. Using the arrows next to the dataset names, you can see the categories held within each dataset. The copy button next to the category name is a shortcut to copy the IQL syntax into the Query Tool console.

Querytool

Unified customer view

A common use case of the Platform is analysing internal data from disparate sources. To understand an audience size, you can use the COUNT function to return the number of distinct rows. By distinct rows, we mean the number of unique keys and so the number of known individuals which meet the query criteria. To do so, submit the query below: 

SELECT
COUNT()

FROM AutoSports

This AutoSports dataset was created by merging AutoSportsR, AutoSportsD and AutoSportsS before importing it to the Platform. Use the query below to see if matching within the Platform increases the number of known individuals.

SELECT
COUNT()

FROM AutoSportsR UNION AutoSportsD UNION AutoSportsS

The UNION operator specifies that you are interested in the combination of the datasets. See this article on audience definition for a full explanation of this clause and the available operators.

To segment this result, you can apply any number of filters. When we say apply filters, we mean to set attribute-based rules to only include a type of individuals in the result. For example, if you were only wanting to include a certain demographic or those who have displayed a particular trait. Building on the same COUNT query, add on the line below:

SELECT
COUNT()

FROM AutoSportsR UNION AutoSportsD UNION AutoSportsS

WHERE AutoSportsD."in market for car"/"in market for car"='Yes'

If you want to filter on an attribute held in an additional dataset, you can use an enrichment dataset which we'll discuss later on. See this article for more information on applying filters.

Alternatively, you can explore this unified view further by using the TopN function. When you submit the query below, it will return a chart showing the five most frequent age groups in the audience. 

SELECT 
TopN(Age/"Advert Range", 5)

FROM AutoSportsR UNION AutoSportsD UNION AutoSportsS

 As you can see, you are querying the datasets and gaining statistics without being able to see the data at a row-level or learn about an individual. 

 

Data matching

When you submit a query, the query is parsed and split into multiple fragments. These fragments are then executed on the separate datasets, with a mathematical representation of the results moving between them. This, alongside the permissions system, enables multiple parties to work together while keeping their customer data private and secure. 

For example, a publisher and advertiser could work together to match their known customers. If ACME wanted to target their lapsed customers on AutoSports, they could use the Platform to tailor an advertising campaign without sharing data. To do this, you would write an aggregation query like this: 

SELECT 
AGGREGATE(
"lapsed customer" {'Yes', 'No'}
)

FROM ACME INTERSECT AutoSports

To maximise the number of matches between the keys, an additional dataset can be referenced in the LINK WITH clause. Try adding the LINK WITH clause as shown below to see how it effects the statistics. See this article on linking datasets for more information.

SELECT 
AGGREGATE(
"lapsed customer" {'Yes', 'No'}
)

FROM ACME INTERSECT AutoSports

LINK WITH IdentityX

 

Data co-operative

The Platform can also be used to facilitate a data alliance, where companies can collaborate with data to gain combined insights while keeping their customer data private. 

The query below brings together the insights held by three publishers to generate aggregated statistics on the combined readership. In this case, the aggregation is using the "sports lover" category held in the AutoSports dataset and uses the WHERE clause to filter the query results using attribute-based criteria, in this case to only include an individual if they are also a mum. 

SELECT
AGGREGATE(
"sports lover" {'Yes', 'No'}
)

FROM AutoSports UNION LifeWeek UNION NewsToday

WHERE mum/mum='Yes'

This category could be held in an additional dataset, such as a third-party data provider or a bank, for enrichment purposes. For example, none of the publishers hold information on whether the individuals in the audience are frequent flyers, so an enrichment dataset can be referenced.

SELECT
AGGREGATE(
"frequent flyer" {'Yes', 'No'}
)

FROM AutoSports UNION LifeWeek UNION NewsToday

ENRICH WITH KnowArc

As the datasets used in the WHERE clause are not a part of the audience, as defined in the FROM clause, the ENRICH WITH clause is used to tell the Platform which datasets are being referenced. See this article on enrichment datasets for much more detail.

 

Activation queries

The queries discussed so far have all been insight queries, which return statistical results founded on aggregated data. If you want to, you can use an activation query to generate a list of identifiers that relate to specific individuals.

To do so, you will need to own, or be given permission to reference, an activation dataset. You may need to contact InfoSum to request access to this feature and tutorial dataset.

When you have access to activation queries, the Query Tool looks a little different. To submit an activation query, the first step is to select the query type.

The syntax for activation queries is slightly different. Each activation query must only reference one activation dataset. For illustrative purposes, the query below would produce a list of emails, which could then be pushed to an external destination.

querytool-activation

To build on this example, you can expand the query to also reference any number of insight datasets. These insight datasets can be used to filter, enrich and link the data, provided you have permission to use the insight datasets in an activation query.

For example, the query below would output a list of Emails, which exist within the intersection of the three datasets (Activation, ACME and Newstoday) and match the attribute "existing customer" is Yes.

querytool-activation

For more information, please see this article on writing activation queries.

Next steps

Now you've seen how queries work in the Platform, try importing a dataset.