Importing a dataset
When you create your dataset, the Platform creates a corresponding Bunker.
A dataset is similar to a table in a conventional database. It's a single group of records which you import, process and query as a single unit. The first step in any InfoSum Platform project is to publish one or more datasets. Once you've published a dataset, you can reference it in queries - and if you choose to, you can allow others user to reference it as well.
There are three main steps in this process.
- First, use the Platform to create your dataset.
- Then, use your private Bunker to import and normalize your data.
- Once you're happy, publish your dataset. Only once it's published is it available to reference in queries.
Create a dataset
When you create a dataset, you tell the Platform that your data exists, and provide a few key details about how it will be used. You don't actually upload or import any data at this stage - that comes in the next step.
You can create a dataset using the New Dataset button on the Datasets tab, or use the plus sign shortcut in the top right which appears on every page throughout the Platform. Here you can select the type of dataset you want.
- An insight dataset can include identifiers and attribute data, and can only be used to create anonymized statistics.
- An activation dataset can only include identifiers and can be used to output a list. The only categories allowed in activation datasets are those used to generate keys.
You'll be asked to supply three details.
- The Private ID is the name you'll use to reference this dataset in queries. A short name, like customerDB, is perfect. Because you'll use it in code, this ID must contain only alphanumeric characters with no spaces.
- The Public Name is a brief title for your dataset, which you'll use to identify it in the Platform UI. Something like Active customer accounts is ideal. If you give other users permission to reference the dataset, they'll also be able to see this public name (but not the private ID).
- The optional Public Description is simply a human-readable explanation of what the dataset is for. Again, if you give other users permission to reference the dataset, they'll be able to read this description.
You can change any of these details later - though of course, if you change the Private ID, you'll also need to change any code you've written which referenced the old ID.
Once you've filled in these details and clicked Create your dataset is created immediately. You'll see it listed on the Datasets tab.
The status icon tells you that you still need to import data for your new dataset.
If you like, you can now give another user permission to reference your new dataset - though of course, they won't actually be able to do that until you have imported some data. See the permissions section for an explanation of this process.
Import your dataset
When you created your dataset in the previous step, the Platform created a corresponding Bunker. Now, you can connect to that Bunker and import your data.
To connect to your Bunker:
- switch to the Datasets tab in the Platform, if you're not there already,
- find the row representing the dataset you've just created (refer to the Public name if you're not sure),
- Select the Access button, then you will be taken to the Bunker interface.
Your Bunker will have an individual randomly-generated name, which you'll see in your browser's address bar. Any time you see that name, you can be confident that you're connected to your own secure Bunker.
The Bunker automatically opens on the Dashboard tab. Next, select Import a dataset and select from the range of sources, such as uploading a CSV file or using a connector.
In this example, we'll import a CSV file. After selecting Connect, you will be taken to a page where you can download the file from your computer. You will then be taken to the Preview page, as shown below. You can perform some optional minor manipulations to the source data here, please see this article for information.
Select the appropriate config option. The final step is to check the dataset has been correctly mapped against our Global Schema. This mapping facilitates the later mathematical representations of the data and the direct comparisons between multiple datasets.
Where the match is correct, e.g. e-mail to Email, accept the import wizard settings, otherwise unselect so the mapping can be corrected during the normalization stage.
Optionally, you can select any columns to be defined as custom categories. If the column used for the custom category is an identifier, such as a Customer ID, you will need to select 'KEY' for it to match keys across datasets. You can use the Filter field to refine the custom categories that appear and tick the Column heading to select or deselect all custom categories that are visible.
You will now use your Bunker's web interface to normalize your data, before publishing it.
Please see the data normalization section for guides on how to cleanse, transform and standardize your dataset. Once this stage is complete, you will need to publish your dataset to make it available for queries.