Bunker and the Script Editor
DTL is the Data Transformation Language, InfoSum's custom scripting language for re-writing and standardizing data.
In this, the first part of our three-part guide, we'll explain how DTL fits into the overall process of importing your data, and how you can begin to write DTL scripts using your bunker's web-based UI.
Before you begin
DTL is an advanced technique and, before you begin, it's important to understand the overall process for normalizing your data. You should be sure you know:
- what normalizing means, and why you need to do it
- what categories are, and how they're different from the columns in your original database.
We know you want to dive straight in, but ten minutes' book-work now will save you a lot of time later. So if you're not completely comfortable with these concepts, it's a very good idea to stop and read those glossary topics first.
To get hands-on experience, you'll also need to have created a dataset and imported some data into your Bunker. If you're up to speed with the concepts and you're looking at your Bunker's web UI, then you're ready to begin.
The role of transformations
To recap, normalizing your data is the process of mapping it onto InfoSum's standard schema. Transformations are an optional step carried out during normalization, which changes (or transforms) your original columns to help them map onto your chosen category.
For example, perhaps you have a "customer number" column, and the customer number contains the customer's date of birth. You can map that onto the Date Of Birth category - but your Bunker won't know how to extract the date from the customer number. Using a transformation, you can change the customer number field to remove the unwanted information, and leave just the date of birth behind.
Transformations are written in InfoSum's specialist scripting language, DTL. As we'll see below, you can use a wizard to set up simple DTL scripts automatically, and for more complicated tasks you can write your own scripts directly in the DTL language.
Assign categories first
Remember that normalization is a three-step process.
- First, your original columns are assigned to categories.
- Then, optionally, transformations are run against each category.
- Finally, the data in each category is normalized.
So, before you can transform your original columns using DTL, you must use bunker to assign them to categories. In brief, the procedure is:
- Find the column you want to assign to the category. Or if you want to assign more than one column to the category, find any one of those columns.
- Click the settings icon next to the category's name and choose Add Category from the menu.
- Optionally, if you want to assign more than one column to this category, choose the remaining columns using the Add additional columns... drop-down.
- Select the category name using the Select a category... drop-down, then click Assign to complete the process.
Once you've assigned the columns to a category, you can go on to set up a transformation, as described below.
When you upload your source data, your Bunker will analyze it and, usually, recommend assigning some columns to categories automatically. If you accept these recommendations, then of course, you don't need to assign the columns again. You can go straight to writing the transformations for those categories.
The Wizard and the Editor
To start working with transformations and DTL, click the settings icon next to the name of the column you assigned. This time, because the column is now assigned to a category, you'll have additional options on the menu.
Select Transformations to open the Transformations dialog. You'll notice that this dialog has two tabs, called Wizard and Editor.
For straightforward transformations, you may be able to use the Transformations Wizard. If the Wizard fits your needs, you won't need to write any DTL at all.
For example, suppose you want to remove the prefix +44 (the country code for the United Kingdom) from every telephone number. That's simple enough to configure in the Wizard - like this:
You can also use the Wizard to string a series of transformations together, as long as each individual transformation is straightforward and you want to do the same transformations on every row.
If your needs are more complex, or if you want to use if-then logic to apply different transformations to different rows, then you'll need to switch to the Transformations Editor. Using the Editor, you can write your transformation directly in the DTL language:
In fact, when you use the Wizard, it's writing a DTL script in the background. You can see the script for yourself if you switch to the Editor tab. So one way to get started with a DTL script - especially while you're learning the language - is to set up a rough version using the Wizard, then fine-tune your script using the Editor.
Watch out, though. Once you've changed your script in the Editor, you can't make any more changes using the Wizard. If you need to go back, you'll have to discard your script in its entirety and begin again.
Next step
We'll explain the DTL language essentials, and what you need to know to start writing your own transformation scripts.