Data Onboarding Automation
Within the InfoSum platform, you can automate all the steps required to publish your dataset.
To create an automation, you must have completed the import & publishing process manually and have a published dataset. The automation will then reuse the configurations in the import (ICC & importer), recordset, normalization, and publishing stages to import & publish your dataset. You can automate the import-to-publish flow based on single files, selection of several files, or entire folders of files.
Pre-requisites |
Import data from a server To use automation for onboarding data, the import method must be set as "Server Import." Automation won't work if you import data from your computer using "Local File Import." User rights To successfully execute the automation, the user responsible for creating or managing it must have the following permissions at the time of execution:
Lack of any of the rights will result in Automation failure. |
Table of Contents:
How to create a data onboarding automation
Setting up an Automation using the API
Viewing your saved and active automations
Running an automation manually or outside its schedule
Cloud Vault file management with automation
Automation failure and troubleshooting
How to create a data onboarding automation
- To automate data onboarding, you first need to import, normalize, and publish your data manually. You can find the steps and instructions for data onboarding in this article. A dataset can only be associated with one Automation task at any given time.
Folder onboarding settings |
If you’re selecting a folder to create your recordset and you intend to automate the steps, please make sure the folder only includes files from one Importer. Automatic Automation steps detection is not possible when a folder includes files from multiple importers. |
- Once you’ve published the dataset containing your imported file, follow these steps:
- Go to the Publishing screen and select the normalized file you’ve just published.
- In the details panel, under the ‘Prepared and Published Datasets’ tab, you will see your published file(s).
- To set up automation for future data onboarding, click the "Automate" button next to your published file.
- Fill in the automation details
- Automation name
- Frequency settings
The frequency can be Monthly, Weekly, Daily, Hourly, and Manual.
The available combinations depend on the frequency selection:
- Monthly: You can select the day of the month (numeral) and time
- Weekly: You can select the day of the week and time
- Daily: You only need to specify the time
- Hourly: You can select an hour-based frequency (e.g. every three hours)
- Manual: There are no other settings to choose from. You will save the automation configuration so it’s ready to run but the automation won’t run automatically at any given time. Click here to see how to run your automation manually.
Important note |
The same automation cannot start again if the previous run hasn’t been completed. Please ensure you are allowing enough time for your Automation to complete before it starts again. |
At this point, you can click into your automation flow to view which configurations are being used and ensure that this is the set up that you require.
Important |
Please ensure that any new files have the same structure as the original import. The automation will use your saved Recordset configuration. |
4. Click create automation
Your automation is now active and it will run automatically at your chosen frequency.
Automating Data Onboarding with the API
You can use the API to automate data onboarding without needing to publish a dataset first. Follow the steps below, and refer to our API documentation for setup details. You will need to create an API key to be allowed to use the API.
Steps to Automate Onboarding with the API:
- Create a Cloud Vault and Dataset (Bunker):
If you don’t already have one, set up a Cloud Vault and Dataset before configuring the automation. - Select or Create Onboarding Task Configurations:
- You can reuse existing task configurations from a previously published dataset if they match your automation requirements.
- If no suitable dataset has been published yet, you'll need to create task configurations first, including:
- Importing
- Manual recordset creation
- Normalization
- Preparing and publishing
- Run the Automation: Once the configurations are in place, you can execute the automation.
Viewing your saved and active automations
Automation tasks in the InfoSum platform are tied to the company, not individual users. Any user with the necessary permissions can create, pause, activate, edit, or delete automation settings across the entire company.
In the "Publishing" tab under file management, you can view saved and active automations. When an automation is complete, all fields will appear in green.
This setup ensures that if a user leaves or is removed, the automation remains accessible to any other authorized user within the company.
Select a specific automation to pause, edit the schedule the automation runs on (or the name), or delete your configuration.
Running an Automation Manually or Outside Its Schedule
To run an automation manually or outside its regular schedule, go to the "Publishing" tab in the file management section to find your saved and active automations. Select the desired automation and click "Run" to start it immediately. Once completed, the automation will return to its original schedule.
Make sure you have the necessary permissions to run an automation manually. Completed automations will be highlighted in green.
Cloud Vault file management with automation
Automation will not overwrite any files or recordsets, instead, each run will create a new one by adding the date/time onto the filename/recordset name at each step.
All files generated from automation runs are available to members of your company if needed.
Note that if the Automation runs frequently then there will be a build up of files and recordsets from every step that should be cleaned up when no longer needed.
Automation failure and troubleshooting
In the "Publishing" tab under file management, you can view saved and active automations. When an automation is complete, all fields will appear in green. If your automation fails, an error message will be displayed in the History Tab of the selected Automation Details Panel and you will see what step has failed in the progress tab.
We do not currently support push notifications for automation failures. You can use the API documentation to get status updates for your automation to an external platform of your choice.
Here are some common error causes when setting up or running automation:
File Import Older Than 90 Days
If your original file import is over 90 days old, the automation won't be able to locate it. To resolve this, manually import and publish your file again, then set up the automation.
Importing a Local File (instead of using a server)
If your original import was not server-based but local file, you will not be able to set up the automation.
Automation Cannot Find the File or Folder
If the automation can't locate the correct file or folder, it may be due to one of these reasons:
- The file or folder name has changed, or the wildcard (*) was used incorrectly. As a result, the automation can't find a file with the expected name during the import.
- The file or folder has been moved to a different location, such as a new subfolder or bucket.
To fix this, ensure the file or folder is in the correct location and the naming matches the original setup.
Your Dataset (Bunker) has expired
If your Dataset (Bunker) has expired (based on the expiry date set when it was created), it no longer exists. As a result, the automation will fail during the publishing stage. To resolve this, create a new Dataset and update your automation settings.
An onboarding configuration has been deleted
If a task configuration used in the automation has been deleted, the automation will fail. Task configurations include: ICC, Importer, Record creation, Normalization, and Publishing. Ensure all necessary task configs are in place to avoid this issue.
The previous automation run is still in progress
You cannot run the same automation task in parallel. Your automation might be failing because the previous run is still in progress. This may be because the file(s) you are onboarding is very large and it takes some time to complete all the onboarding tasks. We recommend reducing the frequency of the automation to ensure the previous run has been completed.