Preparing & publishing data to a Bunker
Once you’ve normalized your data in the platform, it is time to prepare it for publishing to a Bunker. The purpose of the prepare step is to create all of the indexes that enables the InfoSum platform to run fast queries. This is a critical step in ensuring that InfoSum never moves any personally identifiable data during collaborations.
You will have to:
- Select the normalized data you wish to prepare
- Select (or create) the Bunker you wish to publish the data in
- Choose which IDs and Attributes to publish
- Set the rounding and redaction thresholds
- Publish your dataset to the Bunker
Select the normalized file you wish to prepare
To prepare data ready for publishing use the left-hand navigation menu to go to the File Management > Publishing page. Please ensure that you’re in the correct Cloud Vault that contains your normalized data. Select the data and click “Prepare a dataset” at the top of the details panel (on the right hand side of the page)
Choose the Bunker you wish to publish the data to
On this screen you’re able to choose which Bunker you’d like to publish the data to. Note that the Bunker you choose here is always the same for both the Prepare and Publish steps.
There are 4 statuses for Bunkers on this screen: “Ready”, “Prepared”, “Published” and “Incompatible”. A normalised file can be prepared and published to any Bunker that doesn’t have a status of “Incompatible” (this means you’ve got no export columns in the normalisation file and therefore this cannot be published to an activation Bunker)
Be aware that preparing and publishing to a Bunker with a status of “Prepared” or “Published” will overwrite the data in that Bunker.
How to create a new Bunker
If you have no Bunkers available or don’t want to overwrite data that is already in a Bunker, you can click “Create dataset” to create a new Bunker. Bunkers must be in the same geographic region as the Cloud Vault where normalised files are located. Data cannot transfer across geographic regions.
Please follow the instructions on this page to create a new Bunker.
I
Select the keys, categories, and output columns to publish
At this stage, you will be asked which columns you wish to be included in the final published version of your Bunker. There will be two tabs available to you at this step. For insights Bunkers, the tabs will be named “Keys” and “Categories” and for activation Bunkers, the tabs will be named “Keys” and “Output columns”. Select the keys, categories or output columns you wish to prepare and publish now noting the fill rates of each (which shows a percentage of how many rows contains a record for that column)
- Rows shows the number of rows in the dataset that contain this key or attribute
- Values shows the number of total values across all the rows. This number might be higher than the number of rows if there are multi-value columns
- The fill rate shows the percentage of how many rows contains a value for that column. If the fill rate is zero, the column will be highlighted in red
There are some instances when even with the same number of rows and values, your fill rate might be under 100%. For example, if you had multi-value keys or attributes, you could have:
- 6k total rows in your dataset
- 4k rows that contain a certain attribute
- 8k values (it's a multi-value attribute that contains two values per row)
- Your fill rate will be 66% (rows with at least one value/total rows)
Set the rounding and redaction thresholds
Finally, confirm that you’re happy with the rounding and redaction thresholds for this Bunker. If you’d like to change them, click edit and make the changes before clicking “Run Prepare” to continue the process.
- Rounding defines the number that every result will be rounded down to so, if the threshold is set to 100, a result of 2,563,975 rows would be reported as 2,563,900.
- Redaction defines the minimum size of a group so, if the threshold is set to 100, then a result of 87 rows wouldn't be reported on.
The Bunker will now be prepared for publishing
Now the indexes will be created and the Bunker will be prepared. Information is displayed in the details panel that gives more details about the prepared Bunker ready for publishing.
Publish the Bunker
Once the prepare stage has completed, the button text in the details panel will change to “Publish”. Click this button to publish the Bunker ready for collaborating. Once published, the button will go green and this indicates the publish has been successful and the name of the publish denotes the Public ID of the Bunker published to. The Bunker will now also appear in the Data > Datasets screen.
Important note |
A Bunker will stay in a prepared state for only 36 hours before it is terminated. Ensure that you publish the Bunker before the 36 hours expires. |