Importing a dataset
As we explain in our guide to the components of InfoSum Platform, a dataset is similar to a table in a conventional database. It's a single group of records which you import, process and query as a single unit.
The first step in any InfoSum Platform project is to publish one or more datasets. Once you've published a dataset, you can reference it in queries - and if you choose to, you can allow others user to reference it as well.
There are three main steps in this process.
- First, use Dataset Manager to create your dataset.
- Then, use your private Bunker to import and normalise your data.
- Once you're happy, publish your dataset. Only once it's published is it available to reference in queries.
Create a dataset
When you create a dataset, you tell Dataset Manager that your data exists, and provide a few key details about how it will be used. You don't actually upload or import any data at this stage - that comes in the next step.
You can create a dataset using the New Dataset button, which appears on every page throughout Dataset Manager.
You'll be asked to supply three details.
- The Private ID is the name you'll use to reference this dataset from the Platform API. A short name, like
customerDB, is perfect. Because you'll use it in code, this ID must contain only alphanumeric characters with no spaces.
- The Public Name is a brief title for your dataset, which you'll use to identify it in the Dataset Manager UI. Something like
Active customer accountsis ideal. If you give other users permission to reference the dataset, they'll also be able to see this public name (but not the private ID).
- The optional Public Description is simply a human-readable explanation of what the dataset is for. Again, if you give other users permission to reference the dataset, they'll be able to read this description.
You can change any of these details later - though of course, if you change the Private ID, you'll also need to change any code you've written which referenced the old ID.
Once you've filled in these details and clicked Create your dataset is created immediately. You'll see it listed on the Datasets tab.
The status icon tells you that you still need to import data for your new dataset. We'll see how to do that next.
If you like, you can now give another user permission to reference your new dataset - though of course, they won't actually be able to do that until you have imported some data. See the permissions section for an explanation of this process.
Import a dataset
When you created your dataset in the previous step, Dataset Manager created a corresponding Bunker. Now, you can connect to that Bunker and import your data.
To connect to your Bunker:
- switch to the Datasets tab in Dataset Manager, if you're not there already,
- find the row representing the dataset you've just created (refer to the Public name if you're not sure),
- drop down the Action button and select Access Bunker.
Your Bunker will have an individual randomly-generated name, which you'll see in your browser's address bar. Any time you see that name, you can be confident that you're connected to your own secure Bunker.
Your Bunker automatically opens on the Dashboard tab. Next, select Import a dataset and select from the range of sources, such as uploading a CSV file or using a pull connector.
Once you have finished importing, you will use your Bunker's web interface to normalise your data. This process maps your data onto InfoSum's standardised Global Schema, so that it can be matched to the corresponding normalised data from your collaborator.
See the data mapping and transformations section for full guides on how to cleanse, transform and standardise your dataset. Once this stage is complete, you will need to publish your dataset to make it available for queries.