Managing Datasets (Bunkers/Beacons)
You can use the Platform UI to create new Datasets or update meta-data and the Dataset expiry.
What are Bunkers/Datasets?
A Bunker is the secure storage allocated to a single Dataset.
You will be able to publish several datasets or data slices into different Bunkers and to refresh the data within each Bunker. Bunkers are referred to as Datasets on the platform.
Each Dataset is hosted on its own private virtual server. Nobody except you can access the encrypted data in your Dataset (InfoSum's engineering team may aid with tech issues prior to written consent agreement). You can give other users permission to query your Dataset in the platform, but they will only be able to retrieve aggregated statistical results, never the original data.
Your collaborator(s) will also publish their data into their own private Datasets. Although your collaborators give you permission to reference their data in anonymized, aggregate form, you have no access to the data in their Datasets - and they have no access to yours.
Creating a Dataset
Creating a Dataset is easy, as it's linked to your existing Cloud Vault environment. You will simply need to select the CV and give the Dataset a name. Data import into your Dataset happens via your Cloud Vault, so you will need to follow the instructions on that page to import data into your Dataset. Datasets are in the same geographic region as the Cloud Vault where normalised files are located. Data cannot transfer across geographic regions.
You can create a Dataset as part of the data publishing flow. You will need a Dataset before starting a publishing task.
Simply head to the Datasets page and click 'new Dataset' on the right hand side.
In the Dashboard clicking ' create a New Dataset'
or use the plus sign shortcut in the top right which appears on every page throughout the Platform:
You'll be asked to supply four details:
- The Cloud Vault: The one you were in will be selected by default and cannot be changed on this screen. To select a different Cloud Vault, you should change your selection in the Datasets screen
- The Name is a brief title for your Dataset, which you'll use to identify it in the Platform UI. Something like ‘Active customer accounts’ is ideal. If you add your Dataset to a collaboration, they'll also be able to see this name
- The optional Description is simply a human-readable explanation of what the Dataset is for. Again, if you add your Dataset to a collaboration, they'll also be able to see this description
- The Expiry: You can select how long you want to keep this Dataset for. You can select from the list of options or input your own expiration date.
Updating Dataset Details
Note: Only users with the "update dataset metadata" right can change Dataset details.
To change details for a Dataset:
Log into your InfoSum Platform and select the Datasets tab under Data management.
Next, select the Dataset you want to update and click Edit. You will be able to edit the name and expiry date of your Dataset.
Dataset software updates
When InfoSum releases software updates, these aren’t applied by default to existing Datasets, meaning that users can have data published across Datasets that are in different versions of the platform. This shouldn’t cause any compatibility issues when querying between Datasets using different versions, and when you refresh your published assets or publish new data, the Datasets will be automatically updated.
On infrequent occasions, it might be necessary for InfoSum to make an upgrade containing key security or stability changes that are mandatory. This will require all clients to republish their data into upgraded Datasets as soon as possible. Republishing will automatically replace any Dataset with the up to date version.
If this is required, you will receive an email from InfoSum’s support team or your customer success representative requesting that you republish your data at the first opportunity.