Data connector for SFTP
The data connector for SFTP enables you to directly import a dataset from your server.
Before starting, you will need the following information to hand:
-
Host
-
Port
-
Username
-
One of:
-
Password
-
Private Key Pem (Password-less connection).
-
-
Path
-
Filename
- Host public key in one of the formats:
-
- OpenSSH authorized_keys - "Authorized Keys"
- OpenSSH known_hosts - "Known Hosts"
- PEM - "Public Key PEM"
Click here for the steps to host an SFTP server in AWS.
To configure a connection, log in to the Platform and either create a dataset or access the Bunker of an existing dataset.
Select Import a dataset or use the Import tab, and locate the SFTP connector.
Click Connect and a form will appear as shown below to enter your credentials.
Above form contains four different things:
- Connection - Fields: Host, Port, Path
- Authentication - Fields: Username, Password or Private Key Pem
- Host Verification - Fields: Authorized Keys or Known Hosts or Public Key Pem
- GPG Encryption - GPG Public key to encrypt the file
In Bunker UI, you may need to enter three different keys for different purposes. Here are the keys.
Private Key Pem:
You can ignore this field if you are establishing a connection using a password.
This is a user authorization key (User's SSH private key) replacing the password and will be in the form of a public/private key pair.
If you are establishing a connection using SSH key/password-less, you will need to add the public ssh key into the authorized keys file on your server and put the private SSH key in the Private Key Pem field in UI.
Host Verification Key:
You will need to enter a host public key in one of the below formats (You need to enter only one)
- Host Public Keys (OpenSSH authorized_keys format) - "Authorized Keys" in Bunker UI
- Host Public Keys (OpenSSH known_hosts format) - "Known Hosts" in Bunker UI
- Host Public Keys (PEM Format) - "Public Key PEM" in Bunker UI. Currently, we only support PKIX format for public keys. The PEM block with "PUBLIC KEY" will go to this field.
Please note this key is NOT the same as the public part of the user SSH key, this is a public key associated with your server, not with your user.
You can find this key in one of two ways.
1) Your IT team can look it up on the server (probably in the /etc/ssh directory) and there will be a number of files e.g.
- ssh_host_ecdsa_key.pub
- ssh_host_ed25519_key.pub
The contents of one of these files can just be put straight into the "Authorized Keys" field on the Bunker UI. An example format for the ecdsa file:
ecdsa-sha2-nistp256 AAAAE2VjZHNhLXNoYTItbmlzdH..........<redacted>........LB9u5V+o
2) Alternatively, you can SSH into your server then generate a key pair using the following command "ssh-keygen -F <hostname>" then look up in your local known_hosts file for the public key for that host.
GPG Key:
You can ignore this field if you are not uploading an encrypted file.
Your Bunker will generate a public/private key pair. You can use the GPG public key provided here to encrypt your file. The encrypted file will be decrypted using the Bunker’s private key when you upload the file.
When you entered all required fields, click Connect to take you to the Download page, where you can select the files to import.
If you are experiencing slower than expected import/export speeds and you're using a VPN or firewall that can block data upload or download, please refer to whitelisting IP addresses.
Next, click on a filename in the table to copy it to the File name box and select Download, then Connect.
If you are uploading an encrypted file, enable "This file is gpg encrypted" button. When you click Download, the Bunker will decrypt the file using Bunker private key.
A subset of the data will then appear as a preview. You can perform some minor manipulations at this point, such as selecting which columns to import, renaming columns, and excluding rows.
When you're happy with the preview, accept the settings and select a blank import configuration, then you'll be taken to the Import Wizard. This will show how our Platform has understood your dataset and mapped columns into our Global Schema.
If this looks correct, accept the Wizard Settings, otherwise, untick the boxes so they can be correctly mapped during the later normalization phase.