Data Onboarding Automation
Within the InfoSum platform, you can automate all the steps required to publish your Dataset.
To create an automation, you must have completed the import & publishing process manually and have a published Dataset. The automation will then reuse the configurations in the import (ICC & importer), recordset, normalization, and publishing stages to import & publish your Dataset. You can automate the import-to-publish flow based on single files, selection of several files, or entire folders of files.
| Pre-requisites |
|
Import data from a server To use automation for onboarding data, the import method must be set as "Server Import." Automation won't work if you import data from your computer using "Local File Import."
User rights To successfully execute the automation, the user responsible for creating or managing it must have the following permissions at the time of execution:
Lack of any of the rights will result in Automation failure. |
Table of Contents:
Before you start: Server-side automation preparation
How to create a data onboarding automation
Setting up an Automation using the API
Viewing your saved and active automations
Running an automation manually or outside its schedule
Cloud Vault file management with automation
Getting Automation Failure updates
Automation failure and troubleshooting
Before you start: Server-side automation preparation
To successfully set up onboarding automation, you must start with designing an appropriate server-side setup that will allow InfoSum’s Importer function to pick up all of the files and only the files you need to be onboarded for each automation.
The automation will use your saved configurations to pull files or folders from a server, create a recordset, normalize the data, and publish it to a single Dataset. All of these steps must initially be run manually in the InfoSum UI, creating the lineage of configs that prepare a single Dataset (may be one file or many fileparts) for publication to a single Dataset. The automation then executes the exact same Importer and all following steps at a frequency of your choosing.
For this reason, we recommend setting up your importer to only import files in your server that match the criteria and that have been added to your server since the last importer task run by checking the box during creation/editing.
Your Importer should also point to a specific and correct path, folder and/or filename to pull only the intended files (and not other files in the server). As data on your server is refreshed, we recommend that you archive old files for ease of use and cost reduction.
Here are some basic steps to ensure successful automation:
- Establish a Dedicated Folder: Create a dedicated folder for onboarding automation and point your importer to that folder. With your importer, you can bring into your Cloud Vault:
- All files in that folder
- Whole folders in that folder (subfolders must be specified in the importer)
- Specific files/folders identified by name and/or wildcard mapping. If using this method, ensure the filename matches the intended logic when refreshing data.
- Use Separate Source Folders: We recommend a separate source folder for each automation to facilitate easy file management on the server. This approach means:
- You don’t need to specify the file name.
- You won’t accidentally import unintended files.
- You can easily import an entire folder.
-
Schedule Automation Wisely: Ensure you schedule your automation at a time/frequency that does not conflict with data being refreshed on the server.
- If you want an archive of the files you’ve imported, there are three options:
- Move them to a separate folder. In this case, you could have two folders in the server, an active folder used for automation and an archival folder
- Keep a copy of the data at the original data’s location instead of the server connected to the cloud vault
- Alternatively, you can find a copy of the files and folders used for any automation in your Cloud Vault. There is a 30-day retention policy in Cloud Vaults for all unpublished files and recordsets, to remove unnecessary copies of your data and increase the usability of the Cloud Vault.
How to create a data onboarding automation
- To automate data onboarding, you first need to import, normalize, and publish your data manually. You can find the steps and instructions for data onboarding in this article. A Dataset can only be associated with one Automation task at any given time.
| Folder onboarding settings |
| If you’re selecting a folder to create your recordset and you intend to automate the steps, please make sure the folder only includes files from one Importer. Automatic Automation steps detection is not possible when a folder includes files from multiple importers. |
- Once you’ve published the Dataset containing your imported file, follow these steps:
- Go to the Datasets screen and select the Dataset you’ve just published.
- In the details panel, you will see a button called 'Automate'. This button will be disabled if you used a local file to publish to your Dataset
- To set up automation for future data onboarding, click the "Automate" button
- Fill in the automation details
- Automation name
- Frequency settings
- Subscribe to notifications for failed automations (automatically applied, if you untick the the box it won't unsubscribe you. Please see below for how to manage your notifications. )
The frequency can be Monthly, Weekly, Daily, Hourly, and Manual.
The available combinations depend on the frequency selection:
- Monthly: You can select the day of the month (numeral) and time
- Weekly: You can select the day of the week and time
- Daily: You only need to specify the time
- Hourly: You can select an hour-based frequency (e.g. every three hours)
- Manual: There are no other settings to choose from. You will save the automation configuration so it’s ready to run but the automation won’t run automatically at any given time. Click here to see how to run your automation manually.
The time you select is the browser's time zone of the user creating the automation. Please ensure that this works for the intended use cases and any other teams that might be using the data this automation relates to.
| Important note |
| The same automation cannot start again if the previous run hasn’t been completed. Please ensure you are allowing enough time for your Automation to complete before it starts again. |
At this point, you can click into your automation flow to view which configurations are being used and ensure that this is the set up that you require.
| Important |
| Please ensure that any new files have the same structure as the original import. The automation will use your saved Recordset configuration. |
4. Click create automation
Your automation is now active and it will run automatically at your chosen frequency.
You will be able to find the files imported with the automation inside a folder called Automations/automation-name
Automating Data Onboarding with the API
You can use the API to automate data onboarding without needing to publish a Dataset first. Follow the steps below, and refer to our API documentation for setup details. You will need to create an API key to be allowed to use the API.
Steps to Automate Onboarding with the API:
- Create a Cloud Vault and Dataset:
If you don’t already have one, set up a Cloud Vault and Dataset before configuring the automation. - Select or Create Onboarding Task Configurations:
- You can reuse existing task configurations from a previously published Dataset if they match your automation requirements.
- If no suitable Dataset has been published yet, you'll need to create task configurations first, including:
- Importing
- Manual recordset creation
- Normalization
- Preparing and publishing
- Run the Automation: Once the configurations are in place, you can execute the automation.
Viewing your saved and active automations
Automation tasks in the InfoSum platform are tied to the company, not individual users. Any user with the necessary permissions can create, pause, activate, edit, or delete automation settings across the entire company.
In the "Dataset" tab under data management, you can view saved and active automations associated to each Dataset. When an automation is complete, all fields will appear in green.
This setup ensures that if a user leaves or is removed, the automation remains accessible to any other authorized user within the company.
Select a specific automation to pause, edit the schedule the automation runs on (or the name), or delete your configuration.
Running an Automation Manually or Outside Its Schedule
To run an automation manually or outside its regular schedule, go to the "Datasets" tab in the file management section to find your saved and active automations. Select the desired automation and click "Run" to start it immediately. Once completed, the automation will return to its original schedule.
Make sure you have the necessary permissions to run an automation manually. Completed automations will be highlighted in green.
Cloud Vault file management with automation
You will be able to find the files imported with each automation inside a folder called Automations/automation-name
Automation will not overwrite any files or recordsets, instead, each run will create a new one by adding the date/time onto the filename/recordset name at each step.
All files generated from automation runs are available to members of your company if needed.
Note that if the Automation runs frequently, there will be a build-up of files and recordsets from every step that should be cleaned up when they are no longer needed.
Getting Automation Failure updates
You can subscribe to automation failure updates directly when you create a new automation. The checkbox on that form is ticked on by default, you can untick it to avoid subscribing.
Alternatively, you can navigate to the Notifications page on the left side navigation to manage all your optional notifications. Turn on the 'Automation failed' checkbox to subscribe to automation failure emails. Unticking this box will unsubscribe you from all notifications related to automations.
You will need all the relevant rights described above to receive these emails.
These emails will contain all automation failures for the company, including Automations you might not own directly.
You will receive emails such as the one below:
Automation failure and troubleshooting
In the "Datasets" tab under file management, you can view saved and active automations. When an automation is complete, all fields will appear in green. If your automation fails, an error message will be displayed in the History Tab of the selected Automation Details Panel and you will see what step has failed in the progress tab.
We do not currently support push notifications for automation failures. You can use the API documentation to get status updates for your automation to an external platform of your choice.
Here are some common error causes when setting up or running automation:
File Import Older Than 90 Days
If your original file import is over 90 days old, the automation won't be able to locate it. To resolve this, manually import and publish your file again, then set up the automation.
Importing a Local File (instead of using a server)
If your original import was not server-based but local file, you will not be able to set up the automation.
Automation Cannot Find the File or Folder
If the automation can't locate the correct file or folder, it may be due to one of these reasons:
- The file or folder name has changed, or the wildcard (*) was used incorrectly. As a result, the automation can't find a file with the expected name during the import.
- The file or folder has been moved to a different location, such as a new subfolder or bucket.
To fix this, ensure the file or folder is in the correct location and the naming matches the original setup.
Your Dataset has expired
If your Dataset has expired (based on the expiry date set when it was created), it no longer exists. As a result, the automation will fail during the publishing stage. To resolve this, create a new Dataset and update your automation settings.
An onboarding configuration has been deleted
If a task configuration used in the automation has been deleted, the automation will fail. Task configurations include: ICC, Importer, Record creation, Normalization, and Publishing. Ensure all necessary task configs are in place to avoid this issue.
The previous automation run is still in progress
You cannot run the same automation task in parallel. Your automation might be failing because the previous run is still in progress. This may be because the file(s) you are onboarding is very large and it takes some time to complete all the onboarding tasks. We recommend reducing the frequency of the automation to ensure the previous run has been completed.