SDL¶
SDL stands for “storage definition language” and is used synonymously with “DDL.” SDL plays a large part in how Zuar Runner IO jobs process data and how Zuar Runner learns and defines the structure of that data.
Zuar Runner IO jobs take date from various sources (APIs, databases, and files). Fields from input sources don’t always have data types, however data types need to be defined when outputting data to relational databases so destination table structures can be created or updated. A valid data type is required when sending data to relational databases: SDL gives us a mechanism for specifying appropriate data types so the data can be properly stored in the database.
SDL is mostly handled by Zuar Runner internally through specific IO job
steps and transforms
(e.g. ExtraColumnsTransform
, ColumnsTransform
, and
ReflectTransform
), however there are a few use cases where
specifying SDL is useful:
Forcing specific data types, column names, and lengths when outputting to a relational database
Adding new columns to the data piped from an input
Changing a column name in the output table
Removing a column from the output
As data from an input is piped through Zuar Runner, Zuar Runner is learning the data and assigning data types to it. Learn more about how Zuar Runner samples data.
In some situations, you may want to force a specific data type,
possibly to adjust a data type format to a specific destination
database. We can do this by using SDL statements in the Zuar Runner Job
Configuration. In the Zuar Runner Job Configuration, sdl
statments are
added after the input
and output
directives. Since SDL in the
Zuar Runner job applies generically to the data output, SDL would be
applicable to any Zuar Runner job that stores data to a database.
{
input: {...}
output: {...}
sdl: {
columns: [
{
name: bounce_rate
type: Float
}
{
name: percent_new_sessions
type: Float
}
]
}
steps: {...}
How to Force Data Types with SDL¶
This example will have Zuar Runner force the data type of the id
column to
a String
. String
SDL statements can accept an optional Length
argument, setting the maximum string length of the destination
column. NOTE: this Length
argument doesn’t truncate the column value
for the record, it simply forces the column length in the table.
sdl: {
columns: [
{
name: id
type: String
length: 15
}
]
}
Similar to dealing with string length, for Numeric columns we can
force the data type of column value_test
to a decimal
value with
explicit precision and scale.
sdl: {
columns: [
{
name: value_test
type: Numeric
precision: 20
scale: 7
}
]
}
Likewise we can get the _value_
field and rename it to location like this.
sdl: {
columns: [
{
__field__: _value
__jpath__: $._value
length: 20
name: location
type: String
}
]
}
Using Python __eval__
in SDL¶
We are able to embed very simplistic python statements into Zuar Runner job
configurations by usin an __eval__
parameter. Single line Python
expressions can be passed via the __eval__
key. See Python’s
eval()
documentation: https://docs.python.org/3/library/functions.html#eval
Typically, IO jobs should extract data from an input and load it as-is into the output with further data manipulations done with subsequent Zuar Runner SQL jobs.
However, there are cases where you may wish to do some transformation
on the input data as it’s being piped to the output, or to add data
columns that don’t exist in the input data. This (and other use cases)
can be accomplished by using an __eval__
statement in SDL. These
python expressions act on the “per record” data in from the Zuar Runner
input as it’s being processed to the output data format.
Note
The records will be referred to by data[ ] for the purpose of consistency here; in practice, the name of the record will be dictated by the Zuar Runner input used.
Create UTC Date Time Column¶
This SDL example creates a new column in the output database table with the current UTC date and time:
sdl: {
columns: [
{
name: mitto_update_at
type: DateTime
__eval__: column["__type__"].python_type.utcnow()
}
]
}
Create a Column with Static Value¶
This SDL example creates a new string column in the output with the value “Hello World”:
sdl: {
columns: [
{
name: new_string_column
type: String
__eval__: str("Hello World")
}
]
}
Store Only the First N Characters for a Column¶
This SDL example will store only the first 63 characters of the input data and truncate the remaining column data:
sdl: {
columns: [
{
name: mitto_update_at
type: String
__eval__: data["description"][:63]
}
]
}
Store the Merged Values of Two Columns into a New Column¶
This SDL example will combine the values from two columns into a new
column with the name example_full_name
.
sdl: {
columns: [
{
name: example_full_name
type: String
__eval__: str(data["first_name"], " ", data["last_name"])
}
]
}
Add Phrase or String to Each Record in a Column¶
This SDL example will add the word “Time” in front of the value of a string
field, set the field to be called Start
and adjust the length. This is also
useful for forcing a data type to string if the Time field is not supported
in the receiving database. The original column is called Start
and the name is
maitained but the resulting records will looke like Time 12:24:32
.
sdl: {
columns: [
{
name: Start
type: String
__eval__: '''"Time: " + str(data["Start"]["value"])'''
length: 18
}
]
}