Transformations

This reference article contains a complete list of transformations available in DTL.

For an introduction to the concept of transformations (alongside much more background information), see DTL language essentials.

As that article explains, transformations are called in a similar way to functions, but they do not return a result. Instead, they modify - or "transform" - your data, for example by changing the value stored in a particular column.

All transformations are transient, and persist only for the duration of processing the current row. The original database is not changed.

Note that where column names are specified as parameters to transformations, they must be passed as strings. In the most common case, where the column name is simply a string literal, this means it must be enclosed in double-quotes "like this".

If you omit the double-quotes or enclose the column name in backticks, then (as always in DTL) it will be replaced by the value of the column. That value will then be interpreted as the name of a different column. This is almost never what you want.

copy

Create a new column by copying the value of an existing one.

Signature

copy (source: String, target: String)

Parameters

Name Type Optional Comments
source String   The name of the existing column to be copied.
target String   The name of the new column to be created.

Notes

  1. If a column named source does not exist, or if a column named target already exists within the set of columns assigned to this category, an error is raised. Column names are case-sensitive.

drop

Delete a column.

Signature

drop (column: String)

Parameters

Name Type Optional Comments
column String   The name of the column to be deleted.

Notes

  1. If a column named column does not exist within the set of columns assigned to this category, an error is raised. Column names are case-sensitive.

linear_transform

Change the value of a column containing a numeric value by applying a linear scaling function. The original value x is changed to ax + c, where a and c are specified as parameters to the transformation. The result is rounded to the nearest integer.

Signature

linear_transform (column: String, a: Float, c: Float)

Parameters

Name Type Optional Comments
column String  

The name of the column to be processed.

a Float   The value of a in the equation ax + c.
c Float   The value of c in the equation ax + c.

Notes

  1. If a column named column does not exist within the set of columns assigned to this category, an error is raised. Column names are case-sensitive.
  2. The column must contain either an integer or floating-point number, or a string which represents an integer floating-point number. If the column contains a string which cannot be converted to an integer or floating-point number, an error is raised.
  3. The result is rounded to the nearest integer. Results ending in .5 are rounded away from zero.

map

Replace a column's value, based on a list of mappings from an existing value to a new one.

Signature

map (column: String, mapping: Dictionary)

Parameters

Name Type Optional Comments
column String   The name of the column to be processed.
mapping Dictionary   The series of mappings to be applied.

Notes

  1. The mapping parameter consists of one or more existing/replacement value pairs, with the syntax { "existing1":"replacement1", "existing2":"replacement2" ... }
  2. Duplicates are not allowed. If more than one existing:replacement pair has the same value for existing, an error is raised.
  3. For each existing:replacement pair, if the value of the column exactly matches existing, it is changed to replacement.
  4. The match with existing is case-sensitive.
  5. For clarity, whitespace is significant. The match with existing must be exact, including matching any whitespace.
  6. If none of the existing:replacement value pairs exactly matches the value of the column, then the value of the column is changed to NULL.
  7. If the column's value was previously NULL, it remains NULL. It is not possible to use NULL as an existing value.
  8. Once the value of a column has been changed by a mapping, it is not changed again, even if its new value matches one of the existing:replacement value pairs.
  9. If a column named column does not exist within the set of columns assigned to this category, an error is raised. Column names are case-sensitive.
  10. If the column named column has a numeric type, an error is raised.

convert_case

Change the case of a column's value.

Signature

convert_case (column: String, case: String)

Parameters

Name Type Optional Comments
column String   The name of the column to be processed.
case String   The desired case of the column's value. Must be one of lowercase, uppercase or titlecase.

Notes

  1. If case is lowercase or uppercase, then each letter in the column's value is converted to lower or upper case, respectively.
  2. If case is titlecase, then the first letter of each word is converted to title case, and every other letter is converted to lower case. Words are separated by spaces or any punctuation character; for example, one:two-three becomes One:Two-Three.
  3. Characters which are not letters are unaffected.
  4. If case is not one of the permitted values, an error is raised.
  5. If a column named column does not exist within the set of columns assigned to this category, an error is raised. Column names are case-sensitive.
  6. If the column named column has a numeric type, an error is raised.

match_regex

Apply a regular expression to the value of a column, and create new columns containing the values of each capturing group within the regular expression.

Signature

match_regex = (source: String, targets: Array of Strings, regex: String)

Parameters

Name Type Optional Comments
source String   The name of the column to be processed.
targets Array of Strings   The names of the new columns to be created. 
regex String   The regular expression to be applied.

Notes

  • regex must be a regular expression accepted by the RE2 library. For a definition of the syntax supported, see RE2 documentation.
  • In the common case where regex is a string literal, be aware that the backslash is an escape character in DTL string literals. Therefore, any backslashes contained in the regular expression must themselves be escaped with another backslash. For example, the regular expression token \w must be written as \\w within a string literal.
  • targets must be an array of strings, expressed in the syntax ["name1", "name2",...].
  • There must be exactly as many elements in the targets array as there are capturing groups in regex. Otherwise, an error is raised.
  • Each capturing group is mapped to an element in the targets array, in order. For example, the first capturing group is mapped to the first element in the targets array.
  • A new column is created for each capturing group, named after the corresponding element of the targets array. Each column is populated with the content of the applicable capturing group, after applying the regex to the value of the column named source.
  • There may, of course, be any number of non-capturing groups in regex. These are ignored for the purposes of mapping capturing groups to the targets array.
  • If a column named source does not exist, or if any of the columns named in the targets array already exists within the set of columns assigned to this category, an error is raised. Column names are case-sensitive.
  • If the column named source has a numeric type, an error is raised.

set

Change the value and type of a column.

Signature

set (column: String, value: Any)

Parameters

Name Type Optional Comments
column String   The name of the column to be changed.
value Any   The value the column should be changed to.

Notes

  1. If a column named column does not exist within the set of columns assigned to this category, an error is raised. Column names are case-sensitive.
  2. value may have any type.
  3. The column's type may be changed as a result of the transformation. After applying the transformation, the column will have the type which value had.
  4. If value is an empty string, then the column will also be set to an empty string. To set a column to the NULL value, see the separate set_null transformation.

set_null

Change the value of a column to NULL.

Signature

set_null (column: String)

Parameters

Name Type Optional Comments
column String   The name of the column to be set to NULL.

Notes

  1. As in most database systems, NULL is a special value, distinct from zero or the empty string. Use NULL to indicate that the data is absent or unknown.
  2. If a column named column does not exist within the set of columns assigned to this category, an error is raised. Column names are case-sensitive.