Bunkers and Datasets
You can use the Platform UI to create new Datasets or update meta-data and the Dataset expiry settings.
Viewing and managing your Datasets
What are Bunkers/Dataset?
A Bunker is the secure storage allocated to a single Dataset.
You will be able to publish several datasets or data slices into different Bunkers and to refresh the data within each Bunker. Bunkers are referred to as Datasets on the platform.
Each Dataset is hosted on its own private virtual server. Nobody except you can access the encrypted data in your Dataset (InfoSum's engineering team may aid with tech issues prior to written consent agreement). You can give other users permission to query your Dataset in the platform, but they will only be able to retrieve aggregated statistical results, never the original data.
Your collaborator(s) will also publish their data into their own private Datasets. Although your collaborators give you permission to reference their data in anonymized, aggregate form, you have no access to the data in their Datasets - and they have no access to yours.
Bunker usability is controlled within each Collaboration, allowing you to set flexible and custom permissions, down to the key, attribute, and use case level for each of your projects.
Onboarding data to a Bunker
The first step should be to understand what your use case is and what data you will need in your Bunker. You will need to import your data to a Cloud vault, which is a data management environment, normalize the data in a way that suits that use case and ensure you have all the necessary types before publishing it as a Dataset for collaboration. Datasets must be in the same geographic region as the Cloud Vault where normalised files are located. Data cannot transfer across geographic regions.
You can find an overview of data onboarding in this article.
Data needs for collaboration
To collaborate with your partners, you both must have the same Key (ID) in your respective Datasets, or use an ID Bridge partner if no compatible keys exist. You can find more information on the data format requirements here.
There are three types of data hosted in a Dataset that can be identified during the normalization process:
-
IDs and PII must be marked as Keys to be used for collaboration
- Map common IDs to the Global Schema to ensure standardization of data
- Attributes represent the information about your customers (non IDs) and can be used for analysis. Data is defaulted to attribute if not marked as a key.
-
Activation IDs must be marked as Export Columns for activation to third parties. Must be present for activation use cases.
- If your collaboration and activation keys are different (e.g. you are matching on email but exporting an internal ID), you will need to include both keys in the Dataset
- You will not be able to export data out of your Dataset if there are no Export Columns selected at this stage
In this example below:
- Cookies, Email and Device IDs are marked as Key
- Email and Device ID are marked as Export columns
- Everything else is an attribute
Viewing and managing your Datasets
You will be able to see which Datasets are available for your Cloud Vault on the Datasets screen. You will be able to see what data has been published to each Dataset, when they expire, and manage data onboarding automations.
On the right-hand side panel, you can see metadata about your Dataset and the data published to it:
- Information: general information about the Dataset and the last publishing task
- History: A history of each publish task and the recordset associated to it
The last three tabs are related to the data published to the Dataset currently:
- Keys: available identifiers
- Attributes: available information about the identifiers
- Export Columns: which identifiers can be used to run an activation out of the platform
This panel will also show you additional information about each datapoint, such as fill rates, duplicates, null values, or if the columns are single or multi-value.
Creating a new Dataset
You can also create a new Dataset if none of your existing ones are suitable.
You'll be asked to supply four details:
- The Cloud Vault: The one you were in will be selected by default and cannot be changed on this screen. To select a different Cloud Vault, you should change your selection in the Datasets screen
- The Name is a brief title for your Dataset, which you'll use to identify it in the Platform UI. Something like ‘Active customer accounts’ is ideal. If you add your Dataset to a collaboration, they'll also be able to see this name
- The optional Description is simply a human-readable explanation of what the dataset is for. Again, if you add your Dataset to a collaboration, they'll also be able to see this description
- The Expiry: You can select how long you want to keep this Dataset for. You can select from the list of options or input your own expiration date.
Updating Dataset Details
Note: Only users with the "update dataset metadata" right can change dataset details.
To change the details of a Dataset simply click on the three dots next to the action buttons on the Datasets page and click edit.
You can change all details except the Cloud Vault it’s linked to.
Software updates
When InfoSum releases software updates, these aren’t applied by default to existing Datasets, meaning that users can have data published across Datasets that are in different versions of the platform. This shouldn’t cause any compatibility issues when querying between Datasets using different versions, and when you refresh your published assets or publish new data, the Datasets will be automatically updated.
On very rare occasions, it might be necessary for InfoSum to make an upgrade containing key security or stability changes that are mandatory. This will require all clients to republish their data into upgraded Datasets as soon as possible. Republishing will automatically replace any Dataset with the up to date version.
If this is required, you will receive an email from InfoSum’s support team or your customer success representative requesting that you republish your data at the first opportunity.