Using the Demo Datasets
If you’d like to test the platform before moving to the next step, InfoSum offers demo datasets at no cost. Not all features are available, but these datasets will allow you to collaborate, segment audiences, and perform queries.
How to gain access to the Demo Datasets
To access the demo datasets just click on the ‘Demo’ button at the top right of the Datasets page and then 'Add Permissions' in the pop up widget.
You will then see a confirmation on the top right of your screen:
The datasets will then appear in Permissions Received under Collaborate.
Use the Action button to accept each permission, or use the bulk selection functionality in the Permissions Received tab.
Where to find the Demo Datasets
When you switch to the Datasets tab under Data, you will now see the following datasets:
- A brand called ACME
- A series of publishers - Auto and Sports, Lifestyle Weekly and News Today
- A bank
- An identity graph (IdentityX)
- A third-party attribute rich data source (Knowarc)
From here, you can learn about each dataset and its characteristics without being able to view or access the underlying data. This tab displays general information on the datasets, such as the owner, type, name and status. Each dataset that you own or have permission to use is viewable in this tab.
Note: When you create and own a dataset (please check with your InfoSum representative if you have been allocated a Bunker), you will be able to access the Bunker allocated to that dataset. Each Bunker is a separate and secure location that data is imported into and never leaves. As the tutorial datasets are permissions, you cannot access their Bunkers.
To begin exploring the datasets, click on one of the rows and an additional settings box will appear. The box will open on the Details tab, where you can view and edit an expanded version of the table view.
To see which identifiers are held in the dataset, switch to the Keys tab. Keys are the anonymized information InfoSum uses to identify records in separate datasets which relate to the same individual. A key might be a single piece of data, or a combination of data items which together identify a person.
To use datasets together in a query, they must have at least one key in common. Two datasets with the same PII will generate the same key, so the two datasets can use their keys to identify records which relate to the same person. Hover your mouse over a key, to see information on the fill rate of each key.
To see which attributes are held in the dataset, switch to the Category stats tab. Categories are the name given to attributes in our Global Schema, such as age or marital status. This provides an understanding of the data types in the dataset and informs the types of information that could be compared and analyzed. Hover your mouse over a category, to see information on the fill rate of each category.
Sample use cases with pre-built queries
We'll now walk through a range of pre-built queries, based on popular scenarios.
From the main menu on the left side, switch to the Query Tool. The Query Tool is designed to help you build, test and run queries using our Insight Query Language, IQL. Loosely based on SQL, IQL enables you to join, filter and enrich any number of connected datasets. Please see this article for an introduction to IQL.
You will see the tutorial datasets listed as shown in the image below. Using the arrows next to the dataset names, you can see the categories held within each dataset. The copy button next to the category name is a shortcut to copy the IQL syntax into the Query Tool console. For more details, see using the Query Tool.
Unified customer view
A common use case of the Platform is analyzing internal data from disparate sources. To understand an audience size, you can use the COUNT function to return the number of distinct rows. By distinct rows, we mean the number of unique keys and so the number of known individuals which meet the query criteria. To do so, submit the query below:
SELECT
COUNT()
FROM AutoSports
This AutoSports dataset was created by merging AutoSportsR, AutoSportsD and AutoSportsS before importing it to the Platform. Use the query below to see if matching within the Platform increases the number of known individuals.
SELECT
COUNT()
FROM AutoSportsR UNION AutoSportsD UNION AutoSportsS
Note: The UNION operator specifies that you are interested in the combination of the datasets. See this article on audience definition for a full explanation of this clause and the available operators.
To segment this result, you can apply any number of filters. When we say apply filters, we mean to set attribute-based rules to only include a type of individuals in the result. For example, if you were only wanting to include a certain demographic or those who have displayed a particular trait. Building on the same COUNT query, add on the line below:
SELECT
COUNT()
FROM AutoSportsR UNION AutoSportsD UNION AutoSportsS
WHERE AutoSportsD."in market for car"/"in market for car"='Yes'
If you want to filter on an attribute held in an additional dataset, you can use an enrichment dataset which we'll discuss later on. See this article for more information on applying filters.
Alternatively, you can explore this unified view further by using the TopN function. When you submit the query below, it will return a chart showing the five most frequent age groups in the audience.
SELECT
TopN(Age/"Advert Range", 5)
FROM AutoSportsR UNION AutoSportsD UNION AutoSportsS
As you can see, you are querying the datasets and gaining statistics without being able to see the data at a row-level or learn about an individual.
Data matching
When you submit a query, the query is parsed and split into multiple fragments. These fragments are then executed on the separate datasets, with a mathematical representation of the results moving between them. This, alongside the permissions system, enables multiple parties to work together while keeping their customer data private and secure.
For example, a publisher and advertiser could work together to match their known customers. If ACME wanted to target their lapsed customers on AutoSports, they could use the Platform to tailor an advertising campaign without sharing data. To do this, you would write an aggregation query like this:
SELECT
AGGREGATE(
"lapsed customer" {'Yes', 'No'}
)
FROM ACME INTERSECT AutoSports
To maximize the number of matches between the keys, an additional dataset can be referenced in the LINK WITH clause. Try adding the LINK WITH clause as shown below to see how it effects the statistics. See this article on linking datasets for more information.
SELECT
AGGREGATE(
"lapsed customer" {'Yes', 'No'}
)
FROM ACME INTERSECT AutoSports
LINK WITH IdentityX
Data co-operative
The Platform can also be used to facilitate a data alliance, where companies can collaborate with data to gain combined insights while keeping their customer data private.
The query below brings together the insights held by three publishers to generate aggregated statistics on the combined readership. In this case, the aggregation is using the "sports lover" category held in the AutoSports dataset and uses the WHERE clause to filter the query results using attribute-based criteria, in this case to only include an individual if they are also a mum.
SELECT
AGGREGATE(
"sports lover" {'Yes', 'No'}
)
FROM AutoSports UNION LifeWeek UNION NewsToday
WHERE mum/mum='Yes'
This category could be held in an additional dataset, such as a third-party data provider or a bank, for enrichment purposes. For example, none of the publishers hold information on whether the individuals in the audience are frequent flyers, so an enrichment dataset can be referenced.
SELECT
AGGREGATE(
"frequent flyer" {'Yes', 'No'}
)
FROM AutoSports UNION LifeWeek UNION NewsToday
ENRICH WITH KnowArc
As the datasets used in the WHERE clause are not a part of the audience, as defined in the FROM clause, the ENRICH WITH clause is used to tell the Platform which datasets are being referenced. See this article on enrichment datasets for much more detail.
Activation queries
The queries discussed so far have all been insight queries, which return statistical results founded on aggregated data. If you want to, you can use an activation query to generate a list of identifiers that relate to specific individuals.
To do so, you will need to own, or be given permission to reference, an activation dataset. You may need to contact InfoSum to request access to this feature and tutorial dataset.
When you have access to activation queries, the Query Tool looks a little different. To submit an activation query, the first step is to select the query type.
The syntax for activation queries is slightly different. Each activation query must only reference one activation dataset. For illustrative purposes, the query below would produce a list of emails, which could then be pushed to an external destination.
To build on this example, you can expand the query to also reference any number of insight datasets. These insight datasets can be used to filter, enrich and link the data, provided you have permission to use the insight datasets in an activation query.
For example, the query below would output a list of emails, which exist within the intersection of the three datasets (UID, AutoSports and LifeWeek) and match the attribute "tech lover".
For more information, please see the articles on writing activation queries and using the Query Tool.