# Injecting Python Code Certain Zuar Runner inputters, transforms, and steps allow the injection of Python code into a running job: * **inputter** - `ExampleInput` * **transform** - `PythonTransform` * **step** - `PythonStep` TODO: add links to relevant docs # General The value of `python_code` can be any of the following: 1. A string containing the name of a file located in `/var/mitto/data` containing valid Python code. 2. A string containing the fully-qualified path to a file containing valid Python code. 3. A list of one or more strings, with each string being a line of valid Python. The individual strings are joined into a single string that is passed to the Python `exec` function. Depending upon where the `python_code` is used, additional constraints may be placed on the code. ## Formatting the List of Strings When `python_code` is a list of strings, a non-standard formatting convention is used due to inconsistent handling of indentation by HJSON. This is best explained by example: ``` { use: mitto.iov2.steps.builtin#PythonStep python_code: [ # Executed in the context of an instance of the PythonStep class # Because this uses the store as input, the job must be configured # with a store. def _dynamic_step(self): . logging.info("start") . from mitto.iov2.input import StoreInput . from mitto.io.db.utils import (DEFAULT_ENCODE_ERRORS, to_copyfrom_line) . from mitto.io.db.redshift import StreamIter . streamer = StreamIter( . to_copyfrom_line(record, DEFAULT_ENCODE_ERRORS).encode("utf-8") . for record in self.environ[STORE].list() . ) . data = streamer.read() . logging.info("stop") # Function must be assigned to `step` self.step = _dynamic_step ] } ``` Things to note: * The first non-space character on the line is considered to be "column 1". * If the first non-space character is a `.`, it is converted to a space. * Python comments can be used * The variables available for use depend upon the context of execution # Execution Context and Other Requirements ## `PythonStep` When using the `PythonStep` step, `python_code` must define a function that will be valid as a method of the `PythonStep` class. The function must: * Accept a single argument: `self` * Expect to be called once during the execution of the job * Not return a value * Be assigned to the `step` attribute of the class instance ## `PythonTransform` When using the `PythonTransform` transform, `python_code` must define a function that will be valid as a method of the `PythonTransform` class. The function must: * Accept two arguments: `self` and `record` * Expect to be called once for each row of data * Return `record` or a modified version of `record` * Be assigned to the `transform_` attributed of the class instance # Tips and Tricks 1. If you are running the job manually using the CLI via `job_io.py config.json`, you can invoke the python debugger via, e.g.: ``` { use: mitto.iov2.steps.builtin#PythonStep python_code: [ import pdb; pdb.set_trace() ] } ``` Note: this is not possible when the job is being run from the UI, the scheduler, a sequence, or via `mitto run`. 2. You can easily add logging statements. To log every row at a certain point in a set of transforms: ``` { use: mitto.iov2.transform.builtin#PythonTransform python_code: [ def transform_(self, record): . logging.info("record=%s", record) . return record self.transform_ = transform_ ] } ``` To log the job execution environment at a certain point in the steps: ``` { use: mitto.iov2.steps.builtin#PythonStep python_code: [ logging.info("environ=%s", self.environ) ] } ```