A distributed query on InfoSum Platform - just like a query in a conventional database - is a way of filtering, combining and analysing data. Using distributed queries, you can refer to multiple datasets in a single query and the datasets will be automatically joined, where there is a common key. This includes the datasets owned by you and those you have permission to query.
There are two types of queries:
- insight queries, which generate anonymised statistical reports
- identity queries, which can return lists of individuals.
Distributed queries can be planned, optimised, explained and executed in the Query Tool, a web-based interface within the Platform, or programmatically through the API. To assist with writing the query, the Query Tool contains an explain functionality. Before submitting the query, you can view the execution plan generated by the Platform to examine the dataset connectivity and quality metrics.
When a query is executed, the query will be parsed and split into multiple fragments. These fragments are then executed on the distributed datasets, with a mathematical representation of the results moving between the datasets and running a series of tests to return the results. This query distribution enables you to gain insights without moving your data.
By creating a permission, you can enable another user to query your dataset and gain insights in combination with their own data, without being able to view or access your data. The other user would only see the query fragment relating to their dataset, so would not be able to learn about the contents of your dataset. Attribute data is never transmitted between datasets.
All queries must be written in InfoSum’s proprietary query language, the Insight Query Language (IQL). IQL is loosely based on SQL, so if you’re used to writing SQL you will find the syntax to be similar, and enables the unique capabilities of the Platform. Using IQL, you can define an audience and what you are intending to learn and use operators to build relationships between datasets. In addition, using IQL you can filter the results on categorical data, including those held in additional datasets, and reference glue datasets to match up entries where there isn’t a common key.