Class - mitto.iov2.types.OutputSection

JSON Schema

OutputSection

Top-level job configuration section that defines where data will be output.

Example job configuration fragment:

{
   input: {...}
   output: {
       dbo: postgresql://db/analytics
       schema: test_schema
       tablename: test_table
       use: call:mitto.iov2.db#todb
   }
  steps: [...]
}

Example MS SQLServer job configuration fragment:

output: {
    dbo: mssql+pyodbc://user:pass!@warehouse\mssqlserver08/somedb?driver=ODBC+Driver+17+for+SQL+Server&UseFMTONLY=Yes
    schema: test_schema
    tablename: test_table
    use: call:mitto.iov2.db#todb
    query_params: {
        fast_executemany: false
    }
    engine_kwargs: {
        fast_executemany: false
        pool_recycle: 12000
    }
}

Example Redshift job configuration fragment:

output: {
  dbo: redshift+psycopg2://username:password@zuar-redshift.c8xy937zxyyx.us-east-1.redshift.amazonaws.com:5439/dev
    credentials: {
        bucket: east1.redshift.zuar.com
        s3_access_key: AKIAYL4WNSO6DLPL7L6E
        s3_secret_key: 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
    }
  schema: test_schema
  tablename: test_table
  use: call:mitto.iov2.db#todb
}

type

object

properties

  • use

Use

The Python function that will be used to output data.

type

string

examples

call:mitto.iov2.db#todb

call:mitto.iov2#null

call:mitto.iov2#tocsv

call:mitto.iov2.tojson#tojson

  • dbo

Dbo

Connection string of the output database. The string contain actual credentials or named credentials. The first example shows a string containing credentials. The second example shows the use credentials named: steves-creds. See this page for more information.

examples

postgresql://steve:steves-password@zuar.net/analytics

postgresql://steves-creds@zuar.net/analytics

redshift+psycopg2://username:password@zuar-redshift.c8xy937zxyyx.us-east-1.redshift.amazonaws.com:5439/dev

  • schema

Schema

The name of the schema in the output database. Placed in the job’s execution environment as SCHEMA.

type

string

  • tablename

Tablename

Name of the table in the output database. Placed in the job’s execution environment as both TABLE and and ORIGIN_TABLE.

type

string

  • credentials

Credentials

Credentials for use with dbo.

Not required if the dbo already includes credentials. Credentials can be provided in multiple forms.

If the value is a string, it must be the name of previously created ‘named credentials’. At runtime, the value will be replaced with the actual credential values.

If the value is a dict, it must be in one of the following supported credential types:

BasicCredentials:

a dict containing a username/password.

GenericCredentials:

a dict appropriate for the dbo in use.

GoogleServiceAccountCredentials:

a dict appropriate for BigQuery.

RedshiftCredentials:

a dict appropriate for Redshift.

When the value is one of the above credential types, the dbo and the value is passed to sqlalchemy.engine.url.make_url which embeds the credentials in the url, thereby using them when creating the requested database engine.

If the value is None and the dbo is BigQuery, credentials will be taken from mitto_data/credentials/gcloud file.

examples

steves-creds

{“username”: “steve”, “password”: “steves-pass”}

{“bucket”: “east1.redshift.zuar.com”, “s3_access_key”: “AKIAYL4SRFE6DLPL7L6E”, “s3_secret_key”: “eB/0bfoAoYBfMIugYVttIBUHOMUtvmBWuF+qyQer”}

anyOf

type

string

BasicCredentials

GenericCredentials

GoogleServiceAccountCredentials

RedshiftCredentials

  • engine_kwargs

Engine Kwargs

Parameters passed to create_engine.

allOf

EngineKwArgs

  • location

Location

Deprecated. Use “engine_kwargs”: {“location”: “some-location”} instead.

type

string

  • query_params

Query Params

MS SQLServer only.

allOf

SQLServerQueryParameters

  • truncate

Truncate

If ‘true’, the output table is truncated before writing data.

type

boolean

examples

true

false

default

True

definitions

  • BasicCredentials

BasicCredentials

Basic credentials: username/password.

type

object

properties

  • username

Username

The username.

type

string

minLength

1

  • password

Password

The password.

type

string

minLength

1

format

password

  • GenericCredentials

GenericCredentials

Generic credentials - any arbitrary dict.

Used for arbitrary credentials for which no specific credential type is available.

type

object

properties

  • value

Value

Arbitrary credentials formatted as a JSON string

type

string

examples

{“client_id”: “some-id”, “client_secret”: “some-secret”}

default

{

}

format

json

  • GoogleServiceAccountCredentials

GoogleServiceAccountCredentials

Credentials for Google BigQuery.

See this document for information on creating the credentials.

type

object

properties

  • type

Type

type

string

default

service_account

  • project_id

Project Id

type

string

default

mitto-183418

  • private_key_id

Private Key Id

type

string

  • private_key

Private Key

type

string

  • client_email

Client Email

type

string

  • client_id

Client Id

type

string

  • auth_uri

Auth Uri

type

string

  • token_uri

Token Uri

type

string

default

https://oauth2.googleapis.com/token

  • auth_provider_x509_cert_url

Auth Provider X509 Cert Url

type

string

default

https://www.googleapis.com/oauth2/v1/certs

  • client_x509_cert_url

Client X509 Cert Url

type

string

  • RedshiftCredentials

RedshiftCredentials

Credentials for AWS Redshift.

type

object

properties

  • s3_access_key

S3 Access Key

type

string

  • s3_secret_key

S3 Secret Key

type

string

  • bucket

Bucket

type

string

  • SQLServerQueryParameters

SQLServerQueryParameters

Cursor parameters for MS SQLServer.

NOTE: This is only used with MS SQLServer.

This value is passed to cursor.fast_executemany. This value should match the value of engine_kwargs.fast_executemany present in the same output section.

type

object

properties

  • fast_executemany

Fast Executemany

Used to set fast_executemany on the cursor before execution.

type

boolean

  • EngineKwArgs

EngineKwArgs

Runner uses SQLAlchemy to interact with databases. Runner calls sqlalchemy.create_engine to establish a connection with the specified database. create_engine accepts a very large number of parameters that can be used to modify database engine behavior. You can provide parameters to create_engine via the engine_kwargs parameter of the job’s output step.

Example usage:

output: {
    use: call:mitto.iov2.db#todb
    dbo: postgresql://mitto:1234@db/analytics
    schema: example_schema
    tablename: example_table
    engine_kwargs: {
        echo: true
        echo_pool: debug
        pool_pre_ping: false
        execution_options: {
            max_row_buffer: 5000
        }
    }
}
output: {
    use: call:mitto.iov2.db#todb
    dbo: mssql+pyodbc://user:pass!@warehouse\mssqlserver08/somedb?driver=ODBC+Driver+17+for+SQL+Server&UseFMTONLY=Yes
    schema: example_schema
    tablename: example_table
    engine_kwargs: {
        fast_executemany: false
        query_params: {
            fast_executemany: false
        }
    }
}

Refer to the sqlalchemy.create_engine documentation for a list of all possible parameters. Common create_engine parameters are listed here as a convenience.

Note

engine_kwargs accepts any valid create_engine parameter, even if it is not shown here. Be aware that some parameters require values which can’t be provided in a Runner job configuration. For example, the value of creator must be a Python callable, which can’t be specified in a job’s configuration.

Warning

This is an advanced job configuration parameter intended for use only by those with a strong knowledge of both SQLAlchemy and Runner internals.

type

object

properties

  • echo

Echo

type

boolean

  • echo_pool

Echo Pool

type

boolean

  • execution_options

Execution Options

Dictionary of execution options that will be applied to all connections. These will be provided as parameters when Connection.execution_options() is called. Refer to the documentation for a list of supported parameters.

type

object

examples

{“max_row_buffer”: 5000}

  • query_params

Query Params

Additional parameters used only with MS SQLServer.

examples

{“fast_excutemany”: true}

allOf

SQLServerQueryParameters

  • hide_parameters

Hide Parameters

type

boolean

  • isolation_level

Isolation Level

type

string

  • label_length

Label Length

type

integer

  • logging_name

Logging Name

type

string

  • max_identifier_length

Max Identifier Length

type

integer

  • max_overflow

Max Overflow

type

integer

  • param_style

Param Style

type

string

  • pool_logging_name

Pool Logging Name

type

string

  • pool_pre_ping

Pool Pre Ping

NOTE: In some cases, this may be overridden by Runner.

type

boolean

  • pool_size

Pool Size

type

integer

  • pool_recycle

Pool Recycle

type

integer

  • pool_timeout

Pool Timeout

type

integer

  • pool_use_lifo

Pool Use Lifo

type

boolean

  • query_cache_size

Query Cache Size

type

integer

JSON Schema Definitions

#/definitions/BasicCredentials

BasicCredentials

Basic credentials: username/password.

type

object

properties

  • username

Username

The username.

type

string

minLength

1

  • password

Password

The password.

type

string

minLength

1

format

password

#/definitions/GenericCredentials

GenericCredentials

Generic credentials - any arbitrary dict.

Used for arbitrary credentials for which no specific credential type is available.

type

object

properties

  • value

Value

Arbitrary credentials formatted as a JSON string

type

string

examples

{“client_id”: “some-id”, “client_secret”: “some-secret”}

default

{

}

format

json

#/definitions/GoogleServiceAccountCredentials

GoogleServiceAccountCredentials

Credentials for Google BigQuery.

See this document for information on creating the credentials.

type

object

properties

  • type

Type

type

string

default

service_account

  • project_id

Project Id

type

string

default

mitto-183418

  • private_key_id

Private Key Id

type

string

  • private_key

Private Key

type

string

  • client_email

Client Email

type

string

  • client_id

Client Id

type

string

  • auth_uri

Auth Uri

type

string

  • token_uri

Token Uri

type

string

default

https://oauth2.googleapis.com/token

  • auth_provider_x509_cert_url

Auth Provider X509 Cert Url

type

string

default

https://www.googleapis.com/oauth2/v1/certs

  • client_x509_cert_url

Client X509 Cert Url

type

string

#/definitions/RedshiftCredentials

RedshiftCredentials

Credentials for AWS Redshift.

type

object

properties

  • s3_access_key

S3 Access Key

type

string

  • s3_secret_key

S3 Secret Key

type

string

  • bucket

Bucket

type

string

#/definitions/SQLServerQueryParameters

SQLServerQueryParameters

Cursor parameters for MS SQLServer.

NOTE: This is only used with MS SQLServer.

This value is passed to cursor.fast_executemany. This value should match the value of engine_kwargs.fast_executemany present in the same output section.

type

object

properties

  • fast_executemany

Fast Executemany

Used to set fast_executemany on the cursor before execution.

type

boolean

#/definitions/EngineKwArgs

EngineKwArgs

Runner uses SQLAlchemy to interact with databases. Runner calls sqlalchemy.create_engine to establish a connection with the specified database. create_engine accepts a very large number of parameters that can be used to modify database engine behavior. You can provide parameters to create_engine via the engine_kwargs parameter of the job’s output step.

Example usage:

output: {
    use: call:mitto.iov2.db#todb
    dbo: postgresql://mitto:1234@db/analytics
    schema: example_schema
    tablename: example_table
    engine_kwargs: {
        echo: true
        echo_pool: debug
        pool_pre_ping: false
        execution_options: {
            max_row_buffer: 5000
        }
    }
}
output: {
    use: call:mitto.iov2.db#todb
    dbo: mssql+pyodbc://user:pass!@warehouse\mssqlserver08/somedb?driver=ODBC+Driver+17+for+SQL+Server&UseFMTONLY=Yes
    schema: example_schema
    tablename: example_table
    engine_kwargs: {
        fast_executemany: false
        query_params: {
            fast_executemany: false
        }
    }
}

Refer to the sqlalchemy.create_engine documentation for a list of all possible parameters. Common create_engine parameters are listed here as a convenience.

Note

engine_kwargs accepts any valid create_engine parameter, even if it is not shown here. Be aware that some parameters require values which can’t be provided in a Runner job configuration. For example, the value of creator must be a Python callable, which can’t be specified in a job’s configuration.

Warning

This is an advanced job configuration parameter intended for use only by those with a strong knowledge of both SQLAlchemy and Runner internals.

type

object

properties

  • echo

Echo

type

boolean

  • echo_pool

Echo Pool

type

boolean

  • execution_options

Execution Options

Dictionary of execution options that will be applied to all connections. These will be provided as parameters when Connection.execution_options() is called. Refer to the documentation for a list of supported parameters.

type

object

examples

{“max_row_buffer”: 5000}

  • query_params

Query Params

Additional parameters used only with MS SQLServer.

examples

{“fast_excutemany”: true}

allOf

SQLServerQueryParameters

  • hide_parameters

Hide Parameters

type

boolean

  • isolation_level

Isolation Level

type

string

  • label_length

Label Length

type

integer

  • logging_name

Logging Name

type

string

  • max_identifier_length

Max Identifier Length

type

integer

  • max_overflow

Max Overflow

type

integer

  • param_style

Param Style

type

string

  • pool_logging_name

Pool Logging Name

type

string

  • pool_pre_ping

Pool Pre Ping

NOTE: In some cases, this may be overridden by Runner.

type

boolean

  • pool_size

Pool Size

type

integer

  • pool_recycle

Pool Recycle

type

integer

  • pool_timeout

Pool Timeout

type

integer

  • pool_use_lifo

Pool Use Lifo

type

boolean

  • query_cache_size

Query Cache Size

type

integer