Importing a dataset
A dataset is similar to a table in a conventional database. It's a single group of records which you import, process and query as a single unit. The first step in any InfoSum Platform project is to publish one or more datasets. Once you've published a dataset, you can reference it in queries - and if you choose to, you can allow others user to reference it as well.
There are three main steps in this process.
- First, use the Platform to create your dataset.
- Then, use your private bunker to import and normalise your data.
- Once you're happy, publish your dataset. Only once it's published is it available to reference in queries.
Create a dataset
When you create a dataset, you tell the Platform that your data exists, and provide a few key details about how it will be used. You don't actually upload or import any data at this stage - that comes in the next step.
You can create a dataset using the New Dataset button on the Datasets tab, or use the plus sign shortcut in the top right which appears on every page throughout the Platform.
You'll be asked to supply three details.
- The Private ID is the name you'll use to reference this dataset in queries. A short name, like
customerDB, is perfect. Because you'll use it in code, this ID must contain only alphanumeric characters with no spaces.
- The Public Name is a brief title for your dataset, which you'll use to identify it in the Platform UI. Something like
Active customer accountsis ideal. If you give other users permission to reference the dataset, they'll also be able to see this public name (but not the private ID).
- The optional Public Description is simply a human-readable explanation of what the dataset is for. Again, if you give other users permission to reference the dataset, they'll be able to read this description.
You can change any of these details later - though of course, if you change the Private ID, you'll also need to change any code you've written which referenced the old ID.
Once you've filled in these details and clicked Create your dataset is created immediately. You'll see it listed on the Datasets tab.
The status icon tells you that you still need to import data for your new dataset.
If you like, you can now give another user permission to reference your new dataset - though of course, they won't actually be able to do that until you have imported some data. See the permissions section for an explanation of this process.
Import a dataset
When you created your dataset in the previous step, the Platform created a corresponding bunker. Now, you can connect to that bunker and import your data.
To connect to your Bunker:
- switch to the Datasets tab in the Platform, if you're not there already,
- find the row representing the dataset you've just created (refer to the Public name if you're not sure),
- Select the Access button, then you will be taken to the bunker interface.
Your bunker will have an individual randomly-generated name, which you'll see in your browser's address bar. Any time you see that name, you can be confident that you're connected to your own secure bunker.
The bunker automatically opens on the Dashboard tab. Next, select Import a dataset and select from the range of sources, such as uploading a CSV file or using a connector.
For this example, we'll import a CSV file. After selecting Connect, you will be taken to a page where you can download the file from your computer. You will then be taken to the Preview page, as shown below. You can perform some optional minor manipulations to the source data here, please see this article for information.
Once you've accepted the preview configuration, the final step is to check the dataset has been correctly mapped against our Global Schema. This mapping facilitates the later mathematical representations of the data and the direct comparisons between multiple datasets.
Where the match is correct, e.g.
You will now use your bunker's web interface to normalise your data, before publishing it.
Please see the data mapping and transformations section for guides on how to cleanse, transform and standardise your dataset. Once this stage is complete, you will need to publish your dataset to make it available for queries.