IQL functions
InfoSum’s Insight Query Language (IQL) provides functions to analyze connected data.
This reference article describes the IQL functions you can use to query datasets using the Query Tool. Three IQL functions are available: COUNT, AGGREGATE and TOPN.
IQL and SQL functions: Although IQL functions are loosely based on SQL functions, we do not support any other SQL-type functions other than COUNT, AGGREGATE and TOPN. Due to differential privacy requirements, we are unable to offer SUM and AVG functions in IQL. IQL only returns counts on keys, which means it is not possible to perform SUM or AVG type functions on attributes. All counts from COUNT, AGGREGATE and TOPN are differentially private and we have proven that it is not possible to use our current IQL functionality to re-identify individuals.
For more information on IQL, see the IQL reference documentation.
All the examples in this article use the tutorial datasets available on InfoSum Platform.
COUNT
Description
COUNT returns the number of records in a dataset or combination of datasets.
COUNT returns a number based on the number of matching keys, which tells you the size of that audience.
Syntax
SELECT
COUNT()
FROM dataset
Examples
To return the number of unique records in a dataset:
SELECT
COUNT()
FROM ACME
Output:
To return the number of unique records in a combination of datasets:
SELECT
COUNT()
FROM ACME INTERSECT LifeWeek
Output:
To segment this result, you can apply any number of filters. When we say apply filters, we mean to set attribute-based rules to only include a type of individual in the result. For example, the query below counts the number of unique individuals in three datasets who meet the income and location criteria.
SELECT
COUNT()
FROM ACME INTERSECT LifeWeek INTERSECT AutoSportsR
WHERE AutoSportsR."Personal Income"/Income > 20000
AND ACME."existing customer"/"existing customer" = 'Yes'
Output:
To return the number of unique records between two values in a dataset:
SELECT COUNT()
FROM ACME
WHERE ACME.Age BETWEEN 20 AND 50
Output:
To return the number of unique records for multiple values in a dataset:
SELECT COUNT()
FROM ACME
WHERE ACME.Gender/"Binary and Other" IN ('Female','Other')
Output:
AGGREGATE
Description
AGGREGATE performs a calculation on individual records and returns a summarized representation of those values. AGGREGATE requires you to specify how the statistics are reported on - either as bins or representations.
Syntax
SELECT
Aggregate(
Category {bin1, bin2, bin3},
Category Category/Representation
)
FROM dataset
Note: An AGGREGATE query must always contain either a bin or a representation or both, otherwise an error message will be returned. Category is a specific type of information contained in a dataset.
Using Bins
This section describes how you can use bins with AGGREGATE. A bin is a group of values that are counted together when using an insight query to produce statistical reports. For example:
SELECT
Aggregate(
Category {bin1, bin2, bin3},
Category {'string1', 'string2'}
)
FROM dataset
Where:
- Each bin or string is separated with a comma and a space (e.g. bin1, bin2). Additionally, each string is enclosed with single quotes.
- One or more string or bin can be specified, but you cannot mix them.
A typical query would look like this:
SELECT
Aggregate(
ACME.Age {10, 15, 20, 25, 30},
ACME.Gender {'Female', 'Male'}
)
FROM ACME
Output:
The way you define the bins for numeric and categoric data is different (see the table below). In the age example above, the specification of (10, 15, 20, 25, 30) will output the bins ([10, 14], [15, 19], [20, 24], [25, 29]), whereas 'Female' and 'Male' will be outputted as two categorical bins, as shown.
Using Representations
You can use representations in two ways:
- To create rows named after values in representations.
- To create bins based on representations.
Creating rows named after values in representations
For example:
SELECT
Aggregate(
AutoSports."in market for car"/"in market for car",
AutoSports."Marital Status"/"Marital Status"
)
FROM AutoSports
Output:
Creating bins based on representations
The query format is the same as for using bins, for example:
SELECT
AGGREGATE(
ACME.Age/"HMRC Age Range"
)
FROM ACME
Output:
Using bins and representations in the same query
The example below shows how you can use bins and representations in the same AGGREGATE query.
SELECT
Aggregate(
ACME.Age {20,30,40,50},
AutoSports."sports lover"
)
FROM ACME INTERSECT AutoSports
Output:
The table below describes each type of bin you can use in an Aggregate query.
Bin Type |
Description |
Format |
Example |
Categoric Bins |
Returns string values for a category. |
Category The names of categoric bins are surrounded by single quote marks. In IQL (like in SQL), single quote marks identify a string literal. |
Gender
|
Numeric Bins |
Returns continuous values that are numeric. |
Category A numeric bin can be any number between a minimum and maximum (or possibly any number at all). |
Age Outputs the bins: 10-15, 15-20, 20-25, |
Bins based on representations |
Returns continuous values grouped into bins contained in the category. Bin representations are only available for some categories. |
Category/Representation A slash (/) is used to separate the category from the representation. Categories and representations that contain a space must be surrounded by double quote marks, for example: Gender/"Extended Gender" "Personal Income"/Income mum/mum |
Age/”5 Year Bins” Outputs 5 year bins without having to list them out as in the previous example. |
Representations - grouped into ranges |
Each of the bins are based on representations. Returns continuous values grouped into ranges. |
Category/Representation A slash (/) is used to separate the category from the representation. Categories and representations that contain a space must be surrounded by double quote marks, for example: Gender/"Extended Gender" "Personal Income"/Income mum/mum |
Age/"Employment Range" Employment Range representation of the Age category groups values into 0-18, 19-25, 26-35, and so on. Each of these ranges counts as a bin. |
Representations - discrete values from a predefined list |
Can only return a value from a predefined list - each of the possible values counts as a bin. |
Category/Representation A slash (/) is used to separate the category from the representation. Categories and representations that contain a space must be surrounded by double quote marks, for example: Gender/"Extended Gender" "Personal Income"/Income mum/mum |
Address/Region The Region representation of the Address category returns strings like London or Scotland, and each of those strings counts as a bin. |
How privacy controls can affect query results
Privacy controls add a deliberate degree of error to statistical results. This small margin of error prevents scenarios where individuals could be identified by a series of carefully-crafted queries.
Because privacy controls prevent queries reporting on very small numbers of individuals, the following privacy controls can affect your query results:
- Rounding defines the number that bin counts will be rounded down by.
- Redaction defines the number of results required for a bin to be included in the returned results. This means that no results are returned if a query reports on too few individuals.
The rounding and redaction thresholds for your Bunker are set to 100 by default, so any bins that contain less than 100 records are shown as zero. However, the total records count will not be affected by this and will show all records for a query, including any records in zero counted bins. The following example contains bins with less than 200 items for a Bunker with a redaction threshold of 200:
Query:
SELECT
Aggregate(
AutoSportsS.Age/"Population Range",
AutoSportsS.Industry/Industry
)
FROM AutoSportsS
Output:
In the above screenshot, the Bunker redaction threshold set to 200 means that any industries with a count of less than 200 show as zero even when records exist for a bin. For example, the AutoSportsS dataset contains 189 records for Business, Consulting and Management for individuals aged 75 to 80, but the count shows zero.
TOPN
Description
TOPN runs an AGGREGATE query on the entire dataset and returns the top number of aggregated values, sorted from highest to lowest.
TOPN is ideal for aggregating categories with high cardinality, where you are interested in learning about the frequently appearing values in a category, such as the top 5 job titles or the top 10 postcodes.
Syntax
TOPN(Category, ROWNUM)
Where:
ROWNUM is the number of rows to output by highest fill rate.
Category is a specific type of information contained in a dataset.
Examples
To return a table showing the five most frequent age groups in the audience:
SELECT
TopN(Age/"Advert Range", 5)
FROM AutoSportsS
Output: