Data Onboarding Automation
Within the InfoSum platform, you can automate all the steps required to publish your dataset.
To create an automation, you must have completed the import & publishing process manually and have a published dataset. The automation will then reuse the configurations in the import (ICC & importer), recordset, normalization, and publishing stages to import & publish your dataset. You can automate the import-to-publish flow based on single files, selection of several files, or entire folders of files.
Pre-requisites |
Import data from a server To use automation for onboarding data, the import method must be set as "Server Import." Automation won't work if you import data from your computer using "Local File Import." User rights To successfully execute the automation, the user responsible for creating or managing it must have the following permissions at the time of execution:
Lack of any of the rights will result in Automation failure. |
Table of Contents:
Before you start: Server-side automation preparation
How to create a data onboarding automation
Setting up an Automation using the API
Viewing your saved and active automations
Running an automation manually or outside its schedule
Cloud Vault file management with automation
Automation failure and troubleshooting
Before you start: Server-side automation preparation
To successfully set up onboarding automation, you must start with designing an appropriate server-side setup that will allow InfoSum’s Importer function to pick up all of the files and only the files you need to be onboarded for each automation.
The automation will use your saved configurations to pull files or folders from a server, create a recordset, normalize the data, and publish it to a single Bunker. All of these steps must initially be run manually in the InfoSum UI, creating the lineage of configs that prepare a single dataset (may be one file or many fileparts) for publication to a single Bunker. The automation then executes the exact same Importer and all following steps at a frequency of your choosing.
For this reason, your file management process must be designed to ensure that only the most up to date files remain stored on your server with the path & filename that the InfoSum Importer is directed to pull from. Each time data on your server is refreshed, files posted for previous imports should be replaced, removed or archived such that the importer will no longer pick them up and attempt to include them in the automation.
Here are some basic steps to ensure successful automation:
- Establish a dedicated folder for onboarding automation and point your importer to that folder. With your importer, you can bring into your Cloud Vault:
- all files in that folder
- whole folders in that folder (subfolders must be specified in the importer)
- just certain files/folders identified by name and/or a wildcard mapping. If you are using this method, please ensure that the file name always matches the filename or wildcard mapping logic you intend to use when refreshing the data
- We recommend a separate source folder for each automation to ensure easy file management in the server. This means:
- You don’t need to specify the file name
- You won’t accidentally import files you didn’t intend to
- You can easily import a whole folder
- The automation will import everything in the location specified by the importer, so removing old files from the source folder is critical. You can do that by overwriting the file or folder directly with fresh data
- Please ensure that you schedule your automation at a time/frequency that won’t clash with the data being refreshed on the server
- If you want an archive of the files you’ve imported, there are three options:
- Move them to a separate folder. In this case, you could have two folders in the server, an active folder used for automation and an archival folder
- Keep a copy of the data at the original data’s location instead of the server connected to the cloud vault
- Alternatively, you can find a copy of the files and folders used for any automation in your Cloud Vault. Automation does not overwrite any files or recordsets, instead, each run will create a new one by adding the date/time onto the filename/recordset name at each step. All files generated from automation runs are available to members of your company if needed. Note that if the Automation runs frequently, then there will be a build up of files and recordsets from every step that should be cleaned up when no longer needed.
How to create a data onboarding automation
- To automate data onboarding, you first need to import, normalize, and publish your data manually. You can find the steps and instructions for data onboarding in this article. A dataset can only be associated with one Automation task at any given time.
Folder onboarding settings |
If you’re selecting a folder to create your recordset and you intend to automate the steps, please make sure the folder only includes files from one Importer. Automatic Automation steps detection is not possible when a folder includes files from multiple importers. |
- Once you’ve published the dataset containing your imported file, follow these steps:
- Go to the Publishing screen and select the normalized file you’ve just published.
- In the details panel, under the ‘Prepared and Published Datasets’ tab, you will see your published file(s).
- To set up automation for future data onboarding, click the "Automate" button next to your published file.
- Fill in the automation details
- Automation name
- Frequency settings
The frequency can be Monthly, Weekly, Daily, Hourly, and Manual.
The available combinations depend on the frequency selection:
- Monthly: You can select the day of the month (numeral) and time
- Weekly: You can select the day of the week and time
- Daily: You only need to specify the time
- Hourly: You can select an hour-based frequency (e.g. every three hours)
- Manual: There are no other settings to choose from. You will save the automation configuration so it’s ready to run but the automation won’t run automatically at any given time. Click here to see how to run your automation manually.
Important note |
The same automation cannot start again if the previous run hasn’t been completed. Please ensure you are allowing enough time for your Automation to complete before it starts again. |
At this point, you can click into your automation flow to view which configurations are being used and ensure that this is the set up that you require.
Important |
Please ensure that any new files have the same structure as the original import. The automation will use your saved Recordset configuration. |
4. Click create automation
Your automation is now active and it will run automatically at your chosen frequency.
Automating Data Onboarding with the API
You can use the API to automate data onboarding without needing to publish a dataset first. Follow the steps below, and refer to our API documentation for setup details. You will need to create an API key to be allowed to use the API.
Steps to Automate Onboarding with the API:
- Create a Cloud Vault and Dataset (Bunker):
If you don’t already have one, set up a Cloud Vault and Dataset before configuring the automation. - Select or Create Onboarding Task Configurations:
- You can reuse existing task configurations from a previously published dataset if they match your automation requirements.
- If no suitable dataset has been published yet, you'll need to create task configurations first, including:
- Importing
- Manual recordset creation
- Normalization
- Preparing and publishing
- Run the Automation: Once the configurations are in place, you can execute the automation.
Viewing your saved and active automations
Automation tasks in the InfoSum platform are tied to the company, not individual users. Any user with the necessary permissions can create, pause, activate, edit, or delete automation settings across the entire company.
In the "Publishing" tab under file management, you can view saved and active automations. When an automation is complete, all fields will appear in green.
This setup ensures that if a user leaves or is removed, the automation remains accessible to any other authorized user within the company.
Select a specific automation to pause, edit the schedule the automation runs on (or the name), or delete your configuration.
Running an Automation Manually or Outside Its Schedule
To run an automation manually or outside its regular schedule, go to the "Publishing" tab in the file management section to find your saved and active automations. Select the desired automation and click "Run" to start it immediately. Once completed, the automation will return to its original schedule.
Make sure you have the necessary permissions to run an automation manually. Completed automations will be highlighted in green.
Cloud Vault file management with automation
Automation will not overwrite any files or recordsets, instead, each run will create a new one by adding the date/time onto the filename/recordset name at each step.
All files generated from automation runs are available to members of your company if needed.
Note that if the Automation runs frequently then there will be a build up of files and recordsets from every step that should be cleaned up when no longer needed.
Automation failure and troubleshooting
In the "Publishing" tab under file management, you can view saved and active automations. When an automation is complete, all fields will appear in green. If your automation fails, an error message will be displayed in the History Tab of the selected Automation Details Panel and you will see what step has failed in the progress tab.
We do not currently support push notifications for automation failures. You can use the API documentation to get status updates for your automation to an external platform of your choice.
Here are some common error causes when setting up or running automation:
File Import Older Than 90 Days
If your original file import is over 90 days old, the automation won't be able to locate it. To resolve this, manually import and publish your file again, then set up the automation.
Importing a Local File (instead of using a server)
If your original import was not server-based but local file, you will not be able to set up the automation.
Automation Cannot Find the File or Folder
If the automation can't locate the correct file or folder, it may be due to one of these reasons:
- The file or folder name has changed, or the wildcard (*) was used incorrectly. As a result, the automation can't find a file with the expected name during the import.
- The file or folder has been moved to a different location, such as a new subfolder or bucket.
To fix this, ensure the file or folder is in the correct location and the naming matches the original setup.
Your Dataset (Bunker) has expired
If your Dataset (Bunker) has expired (based on the expiry date set when it was created), it no longer exists. As a result, the automation will fail during the publishing stage. To resolve this, create a new Dataset and update your automation settings.
An onboarding configuration has been deleted
If a task configuration used in the automation has been deleted, the automation will fail. Task configurations include: ICC, Importer, Record creation, Normalization, and Publishing. Ensure all necessary task configs are in place to avoid this issue.
The previous automation run is still in progress
You cannot run the same automation task in parallel. Your automation might be failing because the previous run is still in progress. This may be because the file(s) you are onboarding is very large and it takes some time to complete all the onboarding tasks. We recommend reducing the frequency of the automation to ensure the previous run has been completed.