SDL

SDL stands for “storage definition language” and is used synonymously with “DDL.” SDL plays a large part in how Zuar Runner IO jobs process data and how Zuar Runner learns and defines the structure of that data.

Zuar Runner IO jobs take date from various sources (APIs, databases, and files). Fields from input sources don’t always have data types, however data types need to be defined when outputting data to relational databases so destination table structures can be created or updated. A valid data type is required when sending data to relational databases: SDL gives us a mechanism for specifying appropriate data types so the data can be properly stored in the database.

SDL is mostly handled by Zuar Runner internally through specific IO job steps and transforms (e.g. ExtraColumnsTransform, ColumnsTransform, and ReflectTransform), however there are a few use cases where specifying SDL is useful:

  • Forcing specific data types, column names, and lengths when outputting to a relational database

  • Adding new columns to the data piped from an input

  • Changing a column name in the output table

  • Removing a column from the output

As data from an input is piped through Zuar Runner, Zuar Runner is learning the data and assigning data types to it. Learn more about how Zuar Runner samples data.

In some situations, you may want to force a specific data type, possibly to adjust a data type format to a specific destination database. We can do this by using SDL statements in the Zuar Runner Job Configuration. In the Zuar Runner Job Configuration, sdl statments are added after the input and output directives. Since SDL in the Zuar Runner job applies generically to the data output, SDL would be applicable to any Zuar Runner job that stores data to a database.

{
  input: {...}
  output: {...}
    sdl: {
      columns: [
        {
          name: bounce_rate
          type: Float
        }
        {
          name: percent_new_sessions
          type: Float
        }
      ]
    }
    steps:  {...}

How to Force Data Types with SDL

This example will have Zuar Runner force the data type of the id column to a String. String SDL statements can accept an optional Length argument, setting the maximum string length of the destination column. NOTE: this Length argument doesn’t truncate the column value for the record, it simply forces the column length in the table.

sdl: {
  columns: [
    {
      name: id
      type: String
      length: 15
    }
  ]
}

Similar to dealing with string length, for Numeric columns we can force the data type of column value_test to a decimal value with explicit precision and scale.

sdl: {
  columns: [
    {
      name: value_test
      type: Numeric
      precision: 20
      scale: 7
    }
  ]
}

Likewise we can get the _value_ field and rename it to location like this.

sdl: {
    columns: [
      {
        __field__: _value
        __jpath__: $._value
        length: 20
        name: location
        type: String
      }
    ]
  }

Using Python __eval__ in SDL

We are able to embed very simplistic python statements into Zuar Runner job configurations by usin an __eval__ parameter. Single line Python expressions can be passed via the __eval__ key. See Python’s eval() documentation: https://docs.python.org/3/library/functions.html#eval

Typically, IO jobs should extract data from an input and load it as-is into the output with further data manipulations done with subsequent Zuar Runner SQL jobs.

However, there are cases where you may wish to do some transformation on the input data as it’s being piped to the output, or to add data columns that don’t exist in the input data. This (and other use cases) can be accomplished by using an __eval__ statement in SDL. These python expressions act on the “per record” data in from the Zuar Runner input as it’s being processed to the output data format.

Note

The records will be referred to by data[ ] for the purpose of consistency here; in practice, the name of the record will be dictated by the Zuar Runner input used.

Create UTC Date Time Column

This SDL example creates a new column in the output database table with the current UTC date and time:

sdl: {
  columns: [
    {
      name: mitto_update_at
      type: DateTime
      __eval__: column["__type__"].python_type.utcnow()
    }
  ]
}

Create a Column with Static Value

This SDL example creates a new string column in the output with the value “Hello World”:

sdl: {
  columns: [
    {
      name: new_string_column
      type: String
      __eval__: str("Hello World")
    }
  ]
}

Store Only the First N Characters for a Column

This SDL example will store only the first 63 characters of the input data and truncate the remaining column data:

sdl: {
  columns: [
    {
      name: mitto_update_at
      type: String
      __eval__: data["description"][:63]
    }
  ]
}

Store the Merged Values of Two Columns into a New Column

This SDL example will combine the values from two columns into a new column with the name example_full_name.

sdl: {
  columns: [
    {
      name: example_full_name
      type: String
      __eval__: str(data["first_name"], " ", data["last_name"])
    }
  ]
}

Add Phrase or String to Each Record in a Column

This SDL example will add the word “Time” in front of the value of a string field, set the field to be called Start and adjust the length. This is also useful for forcing a data type to string if the Time field is not supported in the receiving database. The original column is called Start and the name is maitained but the resulting records will looke like Time 12:24:32.

sdl: {
  columns: [
    {
      name: Start
      type: String
      __eval__: '''"Time: " + str(data["Start"]["value"])'''
      length: 18
    }
  ]
}