# Union

Zuar Runner's Union plugin combines multiple database tables into a single
"union" database table.

.. image:: assets/union-1.png

The input tables all need to be from the same source
database. However, the output database table can be a completely
different database altogether.

See the [RegEx plugin](/jobs/regex/) if you are trying to
"union" multiple flat files together into a single database table.

## Example Union Use Case

As an example, we want to take two tables that look like this:

Table A

.. figure:: assets/union__table_a.png

Table B

.. figure:: assets/union__table_b.png

And union them together.

.. figure:: assets/union__union_table.png

Notice the addition of the `table_name` column to differentiate between the rows of each table after the union.


### What if the Tables Have Different Columns or Data Types?

Zuar Runner handles this by learning the input data as it is processed and creates the correct output union table.

Table A

.. figure:: assets/union__table_a2.png 

Notice **Table A** has 4 columns: `id`, `words`, `places`, and `last_update`.


Table B:

.. figure:: assets/union__table_b2.png

Notice **Table B** is missing the `places` column.


Union Table

.. figure:: assets/union__table_union2.png

 
Notice the addition of the `table_name` column to differentiate
between the rows of each table after the union. Also notice rows
associated with  **Table B** have **NULL** values for the `places`
column.

## Create a Zuar Runner Union Job

Zuar Runner Union jobs are created with the [Generic
plugin](/jobs/generic/). The job type is `io`
(input/output).

Example job configuration:
```json
    {
        "input": {
            "dbo": "postgresql://db/analytics",
            "use": "union.iov2#TableSetInput",
            "last_modified_column_name": "last_update",
            "source_column_name": "table_name",
            "table_names": {
                "example.table_a": "a",
                "example.table_b": "b"
            }
        },
        "output": {
            "dbo": "postgresql://db/analytics",
            "schema": "unioned",
            "tablename": "tables_unioned",
            "use": "call:mitto.iov2.db#todb"
        },
        "steps": [
            {
                "transforms": [
                    {
                        "use": "mitto.iov2.transform#ExtraColumnsTransform"
                    },
                    {
                        "use": "mitto.iov2.transform#ColumnsTransform"
                    }
                ],
                "use": "mitto.iov2.steps#Input"
            },
            {
                "use": "mitto.iov2.steps#CreateTable"
            },
            {
                "transforms": [
                    {
                        "use": "mitto.iov2.transform#FlattenTransform"
                    }
                ],
                "use": "mitto.iov2.steps#Output"
            },
            {
                "use": "mitto.iov2.steps#CollectMeta"
            }
        ]
    }
```

Union job config values:

- **Input → source_column_name** - This is the new column that will
    appear in the "union" table. In our example above, this is how the
    new column `table_name` is created.
- **Input → table_names** - This JSON object lists the tables to be
    unioned (`{schema}.{table}`) and the values created for the
    `source_column_name`'s rows. In our example above, this is how the
    new `table_name` column is populated with values from each table
    (e.g. `a` and `b`).

### Union with Upsert

Zuar Runner Union jobs can be configured to handle [upsert](/jobs/io/upsert/).

Example job configuration:

```json
    {
        "input": {
            "dbo": "postgresql://db/analytics",
            "use": "union.iov2#TableSetInput",
            "last_modified_column_name": "last_update",
            "source_column_name": "table_name",
            "table_names": {
                "example.table_a": "a",
                "example.table_b": "b"
            }
        },
        "output": {
            "dbo": "postgresql://db/analytics",
            "schema": "unioned",
            "tablename": "tables_unioned",
            "use": "call:mitto.iov2.db#todb"
        },
        "steps": [
            {
                "use": "union.iov2.steps#TimestampBySource",
                "source_column_name": "table_name",
                "last_modified_column_name": "last_update"
            },
            {
                "use": "mitto.iov2.steps.upsert#SetUpdatedAt"
            },
            {
                "transforms": [
                    {
                        "use": "mitto.iov2.transform#ExtraColumnsTransform"
                    },
                    {
                        "use": "mitto.iov2.transform#ColumnsTransform"
                    }
                ],
                "use": "mitto.iov2.steps#Input"
            },
            {
                "use": "mitto.iov2.steps#CreateTable"
            },
            {
                "use": "mitto.iov2.steps.upsert#CreateTempTable"
            },
            {
                "transforms": [
                    {
                        "use": "mitto.iov2.transform#FlattenTransform"
                    }
                ],
                "use": "mitto.iov2.steps#Output"
            },
            {
                "use": "mitto.iov2.steps.upsert#SyncTempTable",
                "key": [
                    "table_name",
                    "id"
                ]
            },
            {
                "use": "mitto.iov2.steps#CollectMeta"
            }
        ]
    }
```

Union job with upsert config values:

- **Input → source_column_name** - This is the new column that will
  appear in the "union" table. In our example above, this is how the
  new column `table_name` is created.
- **Input → table_names** - This JSON object lists the tables to be
  unioned (`{schema}.{table}`) and the values created for the
  `source_column_name`'s rows. In our example above, this is how the
  new `table_name` column is populated with values from each table
  (e.g. `a` and `b`).
- **Input → last_modified_column_name** - This is the timestamp column
  in the "source" tables that is used for upsert. In our example
  above, this is the `last_update` column.
- **Steps → source_column_name** - This is the same column from the
  input. This column is used to determine the last modified column
  value per section of the union. In our example above, this column
  is `table_name`.
- **Steps → last_modified_column_name** - This is the timestamp column
  used for upsert. In our example above, this column is
  `last_update`.
- **Steps → key** - This is a JSON array listing the column(s) that
  uniquely identify a row in the union table. In the example above,
  the primary key from the source tables is `id`. We also add in the
  `table_name` column value from the input. The combination of these
  two values is how we uniquely identify a row in the union table.