Data connector for Amazon S3
The data connector for S3 enables you to directly import a dataset from an Amazon S3 bucket.
It can be used to import delimiter separated value files to InfoSum Platform, such as data exported from AWS Redshift.
Before starting, you will need the following information to hand:
- Access key ID
- Access key secret
- Bucket
- File name
To configure a connection, log in to the Platform if you haven't already and either create a dataset or access the Bunker of an existing dataset. Once you're in the Bunker, select Import a dataset or use the Import tab, and locate the S3 connector.
Click Connect and enter your credentials as shown below.
The above form contains the following fields:
- Access Key ID: You need to authenticate using your AWS credentials
- Access Secret Key: You need to authenticate using your AWS credentials
- Bucket Name: Specify the bucket name (no leading s3 identifier, i.e. "bucket-name" not "s3://bucket-name")
- Prefix: Optionally add extra path in this box [SubFolder/NextFolder/]
- GPG Encryption - GPG Public key to encrypt the file
In the Bunker UI, you may need to enter the GPG key, which you can find here.
GPG Key:
You can ignore this field if you are not uploading an encrypted file.
Your Bunker will generate a public/private key pair. You can use the GPG public key provided here to encrypt your file. The encrypted file will be decrypted using the Bunker’s private key when you upload the file.
When you have completed all required fields, click Connect.
If you are experiencing slower than expected import/export speeds and you're using a VPN or firewall that can block data upload or download, please refer to whitelisting IP addresses.
The above form shows all the files available within the selected S3 bucket.
Copy the file name into the Key box.
If you are uploading an encrypted file, enable the This file is gpg encrypted option. When you click Download, the Bunker will decrypt the file using the Bunker private key.
Next, select Download.
In the Field Delimiter field, select the delimiter used to separate values in the file, and then Connect.
A subset of the data will then appear as a preview. You can perform some minor manipulations at this point, such as selecting which columns to import, renaming columns and excluding rows.
When you're happy with the preview, accept the settings and select a blank import configuration, then you'll be taken to the Import Wizard. This will show how our Platform has understood your dataset and mapped columns into our Global Schema.
If this looks correct, accept the Wizard Settings, otherwise untick the boxes so they can be correctly mapped during the later normalization phase.