Data connector for Google Cloud Storage

The connector for Google Cloud Storage enables you to directly import a Google Cloud Storage dataset to InfoSum Platform.

It can be used to import delimiter separated value files to InfoSum Platform. You can split rows in your data into multiple files with the same structure and merge them to a single dataset.

Before starting, you will need the following to hand:

  • Service account credential file
  • Bucket

You will need to do the following:

  • Obtain a service account credential file: From the Google Management Interface, generate the key file for the service account. This is a JSON file for a service user that allows InfoSum bunker to access your Google Cloud Storage bucket. You do not need to amend this file in any way.
  • Set file permissions: Give read access to the data files in the Google Cloud Storage bucket you intend to download to InfoSum bunker. If your files sit in a subfolder, you may need to give explicit permission to read files from that subfolder.

To configure a connection, login to the Platform if you haven't already done so and either create a dataset or access the Bunker of an existing dataset. Once you're in the Bunker, select Import a dataset or use the Import tab, and locate the Google Cloud Storage connector as shown below.

Click Connect and import the service account credential file.

Click on Connect and you will be taken to the Connect stage. When you specify the Bucket, a table will appear on the right-hand side showing the available file names and sizes.

Specify the file name(s) within the bucket in the Object field, separating each file name with a comma, and click Download. 

Working with multiple files

  • You can specify any number of files. There is no limit to the number of files you can download.
  • Filenames must be separated with a comma.
  • All files must have the same structure.
  • Clicking on a filename overwrites it to the Object field. For this reason, we recommend listing multiple files in a text editor and cutting and pasting them to the Object field. 

In the Field Delimiter field, select the delimiter used to separate values in the file(s), then Connect.

A subset of the data will then appear as a preview. You can perform some minor manipulations at this point, such as selecting which columns to import, enabling multi-value columns, renaming columns and excluding rows.

When you're happy with the preview, accept the settings and select an import configuration, then you'll be taken to the Import Wizard. This will show how our Platform has understood your dataset and mapped columns into our Global Schema.

If this looks correct, accept the Wizard Settings, otherwise untick the boxes so they can be correctly mapped during the later normalisation phase.