Multi-value attributes
Supports a category that contains a list of values
This feature allows you to collapse multiple columns of data into a single column by allowing multiple values or attributes per column.
How does this work?
Let’s take a scenario of a customer importing user hobbies data.
This is how a customer imports data without the Multi-value attribute feature:
ID |
|
Age |
Badminton |
Camping |
Watching Movies |
Karting |
1 |
abc@abc.com |
20 |
Yes |
No |
Yes |
Yes |
2 |
test@test.com |
30 |
No |
Yes |
Yes |
No |
3 |
123@test.com |
17 |
Yes |
Yes |
No |
Yes |
4 |
bcd@123.com |
24 |
Yes |
No |
Yes |
No |
Here is how the data is transformed by the Multi-value attribute feature:
User ID |
|
Age |
Gender |
Hobbies |
1 |
abc@abc.com |
20 |
M |
Badminton|Watching Movies|Karting |
2 |
test@test.com |
30 |
F |
Camping|Watching Movies |
3 |
123@test.com |
17 |
F |
Badminton|Camping|Karting |
4 |
bcd@123.com |
24 |
M |
Badminton|Watching Movies |
With this feature, the Platform can accept a file with multiple values per attribute in a single column. In the above example, the customer can import a file with 5 columns instead of 8.
The primary benefit is that you can have more columns in your file if any of your attributes have multiple values. The multi-value attribute produces the same aggregation output as a single value. Every value in a multi-value attribute is treated independently when used for aggregation or filtering. For more details, see querying multi-value categories.
Implementation
Upload
In the first version, InfoSum will only accept the data already in the form of an array/list type for CSV files uploaded or transferred via SFTP/S3/GCP. Click here for details of dataset limitations.
Customers should have their data in array/list type format for MultiValue columns prior to upload into InfoSum platform. In later versions, we can support that ability via InfoSum platform.
In the first version, InfoSum supports only two delimiters. One to split the columns and the other to split the entries in a list within a column.
In the below example, we are uploading a file containing a multi-value attribute column (List of Hobbies). The data is already in the form of an array/list type.
In preview settings, the Platform will show the delimiter used in the multi-value column. If it’s not the correct delimiter, select the right delimiter in the dropdown list.
Click on the toggle next to the column header and enable a multi-value column.
Repeat the same process for all your multi-value columns. You can perform some other optional minor manipulations to the source data here, please see this article for information.
Click “Accept Preview Config” when you are ready to normalize your data, before publishing it.
Normalization
There is no difference in normalizing your data for Multi-value columns or single value columns.
Analysis
Aggregation and Filtering happen in the same way as before. Every value in a multi-value attribute is treated independently when used in aggregation or filtering.
Activation
Does Not Apply.
Querying multi-value categories
When you query a multi-value category, the total records returned may be different to the combined values of the individual categories. This is because the Total Records and individual categories for multi-value categories are counted in different ways when using the AGGREGATE and TOPN functions:
- The Total Records count is the total number of unique records for a query. Therefore each record in a dataset can only be counted once.
- Multi-value categories count each time a value appears in a row. If a row has multiple values, then each of those values will be added to the count for a category and the individual will appear in multiple categories.
For example, if an individual’s hobbies are badminton, astronomy and chess, they will appear in three categories, but will only be counted once for the Total Records. Use the query:
SELECT
AGGREGATE (
"Primary Hobby"/"List of Hobbies" {'Badminton', 'Astronomy', 'Chess'})
FROM A
To get the output:
Note: This difference occurs for multi-value categories only as single value categories count values once only. This does not affect the COUNT function because it counts the entire dataset, unlike TOPN and AGGREGATE, which count on categories.
Using AGGREGATE with multi-value categories
Because the AGGREGATE function returns all records for a multi-value category, it is a good idea to restrict the output to those categories you are interested in to ensure output is easily readable. For example:
To return all records for a multi-value category:
SELECT
AGGREGATE
("Primary Hobby"/"List of Hobbies")
FROM A
To restrict the output of a multi-value category, add just those categories you want to see:
SELECT
AGGREGATE
("Primary Hobby"/"List of Hobbies" {'Badminton', 'Astronomy', 'Chess'})
FROM A
Using TOPN filters with multi-value categories
Because filters return all values for a row and not a single value, filtering multi-value categories will return all values in a row as well as those you requested. For example, restricting the multi-value category output using TOPN can result in values being returned that you did not request. The query to filter on badminton or astronomy or chess:
SELECT
TOPN
("Primary Hobby"/"List of Hobbies", 10)
FROM A
WHERE "Primary Hobby"/"List of Hobbies" = 'Badminton'
OR "Primary Hobby"/"List of Hobbies" = 'Astronomy'
OR "Primary Hobby"/"List of Hobbies" = 'Chess'
Returns additional topics:
Whereas filtering on single-value categories returns the selected values only. For example, the query:
SELECT
TOPN(Gender/"Extended Gender", 10)
FROM A
WHERE Gender/"Extended Gender" = 'Female'
Returns only the selected category for Female: