Multi-Value Attributes

Supports a category that contains a list of values

This feature will give clients the ability to collapse multiple columns of data into a single column by allowing multiple values or attributes per column

How does this work?

Let’s take a scenario of a customer importing user hobbies data. 

This is how a customer imports data without the Multi-value attribute feature: 

ID

Email

Age

Badminton

Camping

Watching Movies

Karting

1

abc@abc.com

20

Yes

No

Yes

Yes

2

test@test.com

30

No

Yes

Yes

No

3

123@test.com

17

Yes

Yes

No

Yes

4

bcd@123.com

24

Yes

No

Yes

No

Here is how the data is transformed by the Multi-value attribute feature:

User ID

Email

Age

Gender

Hobbies

1

abc@abc.com

20

M

Badminton|Watching Movies|Karting

2

test@test.com

30

F

Camping|Watching Movies

3

123@test.com

17

F

Badminton|Camping|Karting

4

bcd@123.com

24

M

Badminton|Watching Movies

With this feature, the Platform can accept a file with multiple values per attribute in a single column. In the above example, the customer can import a file with 5 columns instead of 8. 

The primary benefit is that you can have more columns in your file if any of your attributes have multiple values. The multi-value attribute produces the same aggregation output as a single value. Every value in a multi-value attribute is treated independently when used for aggregation or filtering. For more details, see querying multi-value categories.

Implementation

Upload:

In the first version, InfoSum will only accept the data already in the form of an array/list type for CSV files uploaded or transferred via SFTP/S3/GCP.

The maximum number of values per category can be 25, 100 respectively for standard, large bunkers. We have a limit of 63 characters on the name of a field, and 2700 characters on the value of a field.

Customers should have their data in array/list type format for MultiValue columns prior to upload into the InfoSum platform. In later versions, we can support that ability via the InfoSum platform

In the first version, InfoSum supports only two delimiters. One to split the columns and the other to split the entries in a list within a column. 

In the below example, we are uploading a file containing a multi-value attribute column (List of Hobbies). The data is already in the form of an array/list type.

In preview settings, the Platform will show the delimiter used in the multi-value column. If it’s not the correct delimiter, select the right delimiter in the dropdown list.

Click on the toggle next to the column header and enable a multi-value column.

 

Repeat the same process for all your multi-value columns. You can perform some other optional minor manipulations to the source data here, please see this article for information. 

Click “Accept Preview Config” when you are ready to normalise your data, before publishing it. 

Normalisation:

There is no difference in normalising your data for Multi-value columns or single value columns.

Analysis:

Aggregation and Filtering happen in the same way as before. Every value in a multi-value attribute is treated independently when used in aggregation or filtering.

Activation:

Does Not Apply. 

Querying multi-value categories

When you query a multi-value category, the total records returned may be different to the combined values of the individual categories. This is because the Total Records and individual categories for multi-value categories are counted in different ways when using the AGGREGATE and TOPN functions:

  • The Total Records count is the total number of unique records for a query. Therefore each record in a dataset can only be counted once.
  • Multi-value categories count each time a value appears in a row. If a row has multiple values, then each of those values will be added to the count for a category and the individual will appear in multiple categories.

For example, if an individual’s hobbies are badminton, astronomy and chess, they will appear in three categories, but will only be counted once for the Total Records. Use the query:

SELECT 
  AGGREGATE (
    "Primary Hobby"/"List of Hobbies" {'Badminton', 'Astronomy', 'Chess'})

FROM A

To get the output:

This difference occurs for multi-value categories only as single value categories count values once only. This does not affect the COUNT function because it counts the entire dataset, unlike TOPN and AGGREGATE, which count on categories.

Using AGGREGATE with multi-value categories

Because the AGGREGATE function returns all records for a multi-value category, it is a good idea to restrict the output to those categories you are interested in to ensure output is easily readable. For example:

To return all records for a multi-value category: 

SELECT 
  AGGREGATE
    ("Primary Hobby"/"List of Hobbies") 

FROM A

To restrict the output of a multi-value category, add just those categories you want to see:

SELECT 
  AGGREGATE
    ("Primary Hobby"/"List of Hobbies" {'Badminton', 'Astronomy', 'Chess'})

FROM A

Using TOPN filters with multi-value categories

Because Filters return all values for a row and not a single value, filtering multi-value categories will return all values in a row as well as those you requested. For example, restricting the multi-value category output using TOPN can result in values being returned that you did not request. The query to filter on badminton or astronomy or chess:

SELECT 
  TOPN
    ("Primary Hobby"/"List of Hobbies", 10)

FROM A

WHERE "Primary Hobby"/"List of Hobbies" = 'Badminton'
 OR "Primary Hobby"/"List of Hobbies" = 'Astronomy'
 OR "Primary Hobby"/"List of Hobbies" = 'Chess'

Returns additional topics:

 

Whereas filtering on single-value categories returns the selected values only. For example, the query:

SELECT 
  TOPN(Gender/"Extended Gender", 10)

FROM A 

WHERE Gender/"Extended Gender" = 'Female'

Returns only the selected category for Female: