Data management with Beacons
Beacons bring collaboration to your data.
With Beacons, there is no need to go through the initial import process into a Cloud Vault. Instead, the Cloud Vault and Datasets are hosted in your own environment, and you will simply use the InfoSum UI to normalize, encrypt, and publish your data to the Datase.
Table of contents
Automate data management tasks
What are Beacons
Beacons is an app for data collaboration that is deployed in your own cloud or warehouse environment, delivering our ‘non-movement of data’ promise even before collaboration has taken place:
- They sit directly within the data owner's cloud infrastructure as a native app in the cloud or warehouse.
- They can host additional data types. As well as PII, Text, CRM, and Log File, Beacons can also host Image, Video, Audio, Geo, Vector data.
- No personal data (PII) ever leaves, only an abstract model of the data is used for queries
- Data always remains in its regulatory region to make global collaboration risk-free.
- Only the data owner can access their Beacon. Permission controls dictate collaboration.
- They can be seamlessly and securely used for collaboration in combination with any other Datasets, using the InfoSum platform UI or API.
At present, we only support Google Cloud Platform (GCP) and Snowflake (all cloud) Beacons.
Our roadmap includes support for AWS, Azure, and Databricks in 2026.
Connect your data to a Beacon
There are two distinct tasks to create a Beacon:
1. Create a Cloud Vault in InfoSum
The first thing you need to do is create a Cloud Vault in InfoSum that is connected to the cloud or warehouse account where your data is stored. You will use this to prepare your data for publishing to your Dataset, but it won’t host any data.
2. Beacons Native App deployment in Cloud/Warehouse
Deploy InfoSum’s Beacons as a native app on your cloud or warehouse account. This will link your project with the virtual Cloud Vault on your InfoSum account to manage the collaboration operations.
You can find detailed instructions for each of our cloud and warehouse partners below:
Beacon data management steps
Beacons remove the need to import data into your Cloud Vault for preparation and instead allow you to normalize, encrypt, and publish your data while keeping it in its original location.
Once you’ve followed the steps outlined above, you can log into your InfoSum account and you will see that the Cloud Vault you created and is hosted by your Cloud provider contains a list of all the tables and views that you have chosen to use during collaboration.
The data management area of our platform becomes simply a command center for getting your data ready (normalizing, preparing and publishing the tables to a Dataset), without hosting any of that data itself.
Data normalization
On your Cloud Vault area, select the table or tables that you wish to use for collaboration and follow these instructions to normalize your data.
You don’t need to create a recordset prior to normalization when using Beacons.
Multi-value columns
During normalisation we automatically detect multi-value columns in tables based on the type of the column if they are provided in a structured, array-like, (e.g. a list of strings per cell like {"a","b","c"}.
If your multi-value delimiter is different, please use the modifier function 'Set multi-value delimiter' and select your delimiter from the list.
Address mapper
We only provide address mapper functionality for the US with Beacons (currently in Beta). Please follow the instructions in this article US address mapper
- For UK addresses, please provide UDPRN and match it to the Global schema key with the same name.
All columns containing date-time will be automatically identified as timestamp.
Prepare and Publish
After the data is normalized it can be published to a new or existing Dataset.
The data hosted in the Dataset will remain in your cloud account - only metadata like the dataset name and column headers are visible in the platform.
Incremental Dataset updates
If your dataset is really large or changes frequently, incremental updates can help keep your data up to date and to speed up data processing times by reducing the amount of data that needs to be normalized and prepared at a given task.
Incremental updates will process only new records from your table or view and append those to your dataset, as well as expire records that are outside of your retention period
To update your Dataset incrementally, you’ll need five things:
- Ensure that you have one date column with the creation date/last updated date and select it as your update field and one date column that can act as retention period. They can be the same column.
- The new recordset must have all keys and attributes already present in the data (can also contain additional ones)
- Enable Incremental Normalization during the normalization config saving screen (toggle on) - this allows you to identify the new records to be processed.
- Enable Incremental Prepare during the prepare stage - this allows you to append the new records and expire records outside of your retention period.
- Set up an automation - to ensure that the data is refreshed at your chosen cadence
- Incremental normalization config (identify new records)
When re-normalising a recordset the value of this field in each record will be compared against the date of the last normalisation of that recordset and only new records will be normalised.
- Incremental prepare (append and expire)
When starting the prepare task, please choose incremental prepare.
Please note if you choose the standard prepare option, your prepare task will overwrite your existing dataset with your normalized recordset. This means if you have normalized only new records since the last task, that will be the only data contained in your Dataset after you publish it.
Select your keys, attributes, and output column and on the last screen you will be asked to select your date column and set up a retention period for your data. The platform will use this column to expire records that are outside your retention period every time that you publish new data.
Publish your data as usual once the prepare task is ready.
- Set up an automation - we strongly recommend that you set up an automation to ensure that your records are appended and expired at the selected cadence
Stop/start functionality
Hosting and processing data with Beacons incurs storage and computing costs in your cloud or warehouse infrastructure. To minimize the associated costs, we’ve implemented stop/start functionality where we only keep resources active whilst they are required for collaboration.
Beacon datasets will show as ‘Paused’ in your datasets list and dashboard after an hour of inactivity. This means that we have removed or suspended the instance, and it will need to be resumed before activity can take place.
Your Beacon datasets will resume with each Automation and remain active for an hour after the task is completed (or longer if you are performing other tasks).
You have two options to resume your Dataset:
- Navigate to the Bunkers page and select the paused dataset. Click the three dots on the tile and click ‘resume’. The status will change from ‘Paused’ to ‘Starting’
- You or a partner can start a query (audience, segment or IQL) with a paused Dataset, and it will kickstart the process immediately. Your query will run after the Dataset has been resumed
We recommend option 1 if possible, as this makes sure you are live when you are ready to query.
The time required to resume a Dataset depends on the size of your dataset (volume of rows and columns). Below is some general guidance based on a dataset with 10/20 columns:
- <100M rows: 5-10 min
- 100-500M rows: 10-20 min
- 500M-1BN rows: 30-45 min
Once the Dataset is ready, you shouldn’t experience any delay in standard platform operations.
Automate data management tasks
Once your Dataset is published, the data management tasks can be automated.
You can follow our standard Automation instructions, with the caveat that Beacon Datasets don’t have data import or recordset creation steps, so the only two tasks to be run are normalization and prepare/publish and you don’t need to worry about server set up.
Please ensure you have the correct rights to create and run automations (listed on the link above)
Next steps
Once you have a dataset published in the platform, you can start collaborating with your partners.

