Data connector for Google Cloud Storage
The connector for Google Cloud Storage enables you to directly import a Google Cloud Storage dataset to InfoSum Platform.
It can be used to import delimiter separated value files to InfoSum Platform. You can split rows in your data into multiple files with the same structure and merge them to a single dataset.
Note: InfoSum does not support Google Cloud Storage (GCS) parallel composite uploads.
Before starting, you will need the following to hand:
- Service account credential file
- Bucket
You will need to do the following:
- Obtain a service account credential file: From the Google Management Interface, generate the key file for the service account. This is a JSON file for a service user that allows InfoSum Bunker to access your Google Cloud Storage bucket. You do not need to amend this file in any way.
- Set file permissions: Give read access to the data files in the Google Cloud Storage bucket you intend to download to InfoSum Bunker. If your files sit in a subfolder, you may need to give explicit permission to read files from that subfolder.
Click here for the steps to configure your Google Cloud Platform.
To configure a connection, log in to the Platform if you haven't already done so and either create a dataset or access the Bunker of an existing dataset. Once you're in the Bunker, select Import a dataset or use the Import tab, and locate the Google Cloud Storage connector as shown below.
Click Connect and import the service account credential file.
In the Bunker UI, you may need to enter the GPG key, which you can find here.
GPG Key:
You can ignore this field if you are not uploading an encrypted file.
Your Bunker will generate a public/private key pair. You can use the GPG public key provided here to encrypt your file. The encrypted file will be decrypted using the Bunker’s private key when you upload the file.
Click on Connect and you will be taken to the Connect stage. When you specify the Bucket, a table will appear on the right-hand side showing the available file names and sizes.
If you are experiencing slower than expected import/export speeds and you're using a VPN or firewall that can block data upload or download, please refer to Add IP addresses to an Allowlist.
In the Object field, specify the file name(s) within the bucket, separating each file name with a comma.
Working with multiple files
-
- You can specify any number of files. There is no limit to the number of files you can download.
- Filenames must be separated with a comma.
- All files must have the same structure.
- Clicking on a filename overwrites it to the Object field. For this reason, we recommend listing multiple files in a text editor and cutting and pasting them to the Object field.
If you are uploading an encrypted file, enable the This file is gpg encrypted option. When you click Download, the Bunker will decrypt the file using the Bunker private key.
Next, click Download.
In the Field Delimiter field, select the delimiter used to separate values in the file(s), then Connect.
A subset of the data will then appear as a preview. You can perform some minor manipulations at this point, such as selecting which columns to import, enabling multi-value columns, renaming columns and excluding rows.
When you're happy with the preview, accept the settings and select an import configuration, then you'll be taken to the Import Wizard. This will show how our Platform has understood your dataset and mapped columns into our Global Schema.
If this looks correct, accept the Wizard Settings, otherwise untick the boxes so they can be correctly mapped during the later normalization phase.