Incremental Dataset updates
The standard method for publishing data to your Dataset is to overwrite the existing data with your new publish task. If you wish to append records and expire older records, you will need to use the incremental prepare function.
This article provides a brief overview of the setup required for incremental updates.
Data requirements
With incremental refresh, you can append your prepared recordset to the existing data in your Dataset. To use this option, you’ll need:
- Provide a new file with only the new records you wish to append
-
A date column in your normalized recordset that can be used as a creation timestamp to set a retention window. If there are no date columns, this option will not be clickable.
- If your data doesn't have datetime columns, you can add one during the normalization step
- The new recordset must have all keys and attributes already present in the data (can also contain additional ones)
Incremental settings
To do an incremental update you should follow the standard onboarding process with just a few additons:
-
Import your files - please ensure that they only contain the new records to append and a date time column
- (optional) You can set up your importer to import only new files by checking the box in the screenshot below:
- (optional) You can set up your importer to import only new files by checking the box in the screenshot below:
- Create a recordset as usual
-
Normalize your recordset - continue with standard normalization of your data. Please use the date time modification to parse the required date column as a timestamp
- On the normalizer screen, click the gray plus (+) in the top corner
- Select Modify Columns > Parse Date Time Format.
- Select the format of your data from the dropdown or type your format if it's not listed
- Select the date column as the input column
- Rename your column to 'Retention timestamp' (or another useful name).
If you do not have a date time column on your recordset, you can add one at this step: - On the normalizer screen, click the gray plus (+) in the top corner
- Select 'Add datetime column'
- You will be able to select between: adding a column with the datetime of the normalization task or adding a fixed datetime
- We recommend you use the datetime of normalization if you're using automations so the date is dynamically updated
-
Prepare your Dataset incrementally - please click on 'incremental prepare' to configure the update (bottom right corner)
- You will be asked to select your date column and set up a retention period for your data. The platform will use this column to expire records that are outside your retention period every time that you publish new data. The maximum number of days set for the retention period can be as high as
9,223,372,036,854,775,807, the highest number computers can process.
- You will be asked to select your date column and set up a retention period for your data. The platform will use this column to expire records that are outside your retention period every time that you publish new data. The maximum number of days set for the retention period can be as high as
- Publish your data as usual once the prepare task is ready
-
Set up an automation - we strongly recommend that you set up an automation to ensure that your records are appended and expired at the selected cadence