Preparing & publishing data to a Bunker
Once you’ve normalized your data in the platform, it is time to prepare it for publishing to a Bunker. The purpose of the prepare step is to create all of the indexes that enables the InfoSum platform to run fast queries. This is a critical step in ensuring that InfoSum never moves any personally identifiable data during collaborations.
You will have to:
- Select the normalized data you wish to prepare
- Select (or create) the Bunker you wish to publish the data in
- Choose which IDs and Attributes to publish
- Set the rounding and redaction thresholds
- Publish your dataset to the Bunker
Select the normalized file you wish to prepare
To prepare data ready for publishing use the left-hand navigation menu to go to the File Management > Publishing page. Please ensure that you’re in the correct Cloud Vault that contains your normalized data. Select the data and click “Prepare a dataset” at the top of the details panel (on the right hand side of the page)
Choose the Bunker you wish to publish the data to
On this screen you’re able to choose which Bunker you’d like to publish the data to. Note that the Bunker you choose here is always the same for both the Prepare and Publish steps.
There are 4 statuses for Bunkers on this screen: “Ready”, “Prepared”, “Published” and “Incompatible”. A normalised file can be prepared and published to any Bunker that doesn’t have a status of “Incompatible”. Be aware that preparing and publishing to a Bunker with a status of “Prepared” or “Published” will overwrite the data in that Bunker. If the status is “Incompatible”, this means you’ve got no output columns in the normalisation file and therefore this cannot be published to an activation Bunker.
How to create a new Bunker
If you have no Bunkers available or don’t want to overwrite data that is already in a Bunker, you can click “Create dataset” to create a new Bunker. Bunkers must be in the same geographic region as the Cloud Vault where normalised files are located. Data cannot transfer across geographic regions.
Please follow the instructions on this page to create a new Bunker.
I
Select the keys, categories, and output columns to publish
At this stage, you will be asked which columns you wish to be included in the final published version of your Bunker. There will be two tabs available to you at this step. For insights Bunkers, the tabs will be named “Keys” and “Categories” and for activation Bunkers, the tabs will be named “Keys” and “Output columns”. Select the keys, categories and output columns you wish to prepare and publish now noting the fill rates of each (which shows a percentage of how many rows contains a record for that column)
Set the rounding and redaction thresholds
A pop-up will appear in the browser which asks you to confirm that you’re happy with the rounding and redaction thresholds for this Bunker. If you’d like to change them, click edit and make the changes before clicking “Prepare a dataset” to continue the process.
- Rounding defines the number that every result will be rounded down to so, if the threshold is set to 100, a result of 2,563,975 rows would be reported as 2,563,900.
- Redaction defines the minimum size of a group so, if the threshold is set to 100, then a result of 87 rows wouldn't be reported on.
The Bunker will now be prepared for publishing
Now the indexes will be created and the Bunker will be prepared. Information is displayed in the details panel that gives more details about the prepared Bunker ready for publishing.
Publish the Bunker
Once the prepare stage has completed, the button text in the details panel will change to “Publish”. Click this button to publish the Bunker ready for collaborating. Once published, the button will go green and this indicates the publish has been successful and the name of the publish denotes the Public ID of the Bunker published to. The Bunker will now also appear in the Data > Datasets screen.
Important note |
A Bunker will stay in a prepared state for only 36 hours before it is terminated. Ensure that you publish the Bunker before the 36 hours expires. |