Select query type
The first part of any query is using the SELECT clause to specify the type.
This defines whether the objective of the query is to produce aggregated insights or a list of identifiers:
- Insight queries are your way to analyze connected data without the risk of identifying individuals. This type of query enables privacy-safe analysis and collaboration across separated datasets and only produces aggregated insights.
- Activation queries are your way to generate a list of identifiers which can be pushed elsewhere as an audience, and are disabled by default. Because the results identify individuals, it is important to consider legal and regulatory requirements whenever you use an activation query.
In this article, we’ll see how to use the SELECT clause in both insight and activation queries.
Both the SELECT and FROM clauses are mandatory, so the examples shown will include both clauses. The following sections describes the FROM component and its operators in more detail.
Insight query
There are three aggregate functions available - COUNT, TopN and Aggregate.
COUNT is a quick and easy way to learn the number of unique records in a dataset or combination of datasets. To do so, simply write the following:
SELECT COUNT()
FROM A Union B
This is similar to the overlap in the Connections tab and can be expanded to reference any number of datasets and add further clauses, such as using the WHERE clause to apply filters. For example, the query below counts the number of unique individuals in three datasets who meet the income and location criteria.
SELECT COUNT()
FROM A Union B Union C
WHERE Income > 20000 AND Location = "London"
COUNT returns a number based on the number of matching keys, which tells you the size of that audience. TopN and Aggregate are slightly different as they return an aggregation of categorical data.
TopN is ideal for aggregating categories with high cardinality, where you are interested in learning about the frequently appearing values in a category, such as the top 5 job titles or the top 10 postcodes.
SELECT TopN(Category."Representation", Number of values)
FROM A Union B
So, if you wanted to know the top 10 postcodes, you would write something like the following:
Aggregate, on the other hand, requires you to specify how the statistics are reported on - either as bins or representations. For example:
SELECT
Aggregate(
Category {bin1, bin2, bin3},
Category {'string1', 'string2'}
)
FROM dataset
So, a query would look like this:
SELECT
Aggregate(
Age {10, 15, 20, 25, 30},
Gender {'Female', 'Male'}
)
FROM A
Note: As you can see, the names of categoric bins are surrounded by single quote marks. In IQL (like in SQL), single quote marks identify a string literal.
The way you define the bins for numeric and categoric data is different. In the age example above, the specification of (10, 15, 20, 25, 30) will output the bins ([10, 14], [15, 19], [20, 24], [25, 29]), whereas 'Female' and 'Male' will be outputted as two categorical bins.
Alternatively, you can infer the bins based on representations. For example, rather than listing out the five year bins, you can simply specify a representation such as "5 Year Bins". To do so, write something like the following:
SELECT
Aggregate(
Age/”5 Year Bins”,
)
FROM A
Note: As you can see above, a / is using to separate the category from the representation. The names of the representations are also surrounded by double quote marks to identify a representation, rather than a bin.
Activation query
To write an activation query, simply reference the activation dataset and the output column:
SELECT
dataset.outputcolumn
FROM dataset
You can also reference insight datasets to filter, enrich and link the results.
SELECT
A.Email
FROM A Exclude B
WHERE Income > 20000 AND Location = "London"
Next steps
The next step of any IQL query is to use the FROM clause. This enables you to define an audience by referencing published datasets and specifying the relationships, for example by using operators to define the union, intersection or exclusion of multiple datasets.