Upsert

What is Upsert?

Many Zuar Runner IO jobs can be configured for upsert and is derived from the term “Update” and “Insert”.

Upsert updates existing rows and inserts new rows when outputting to a database.

Requirements for Upsert

When configuring a Zuar Runner IO job to use upsert, there are two requirements that need to be met by the source (input):

  1. A way to uniquely identify a single row. Typically this is a primary key. It can be a single column or a combination of columns.

  2. A true last modified column. Typically this is a datetime column that is updated anytime a value in a row is changed.

Note

Just because a table has a date or datetime column does not mean it has a true last modified column.

Supported Inputs and Outputs

Supported Inputs

Most IO jobs support upsert, which includes most API connections. For example:

Supported Outputs

All of the database outputs support upsert.

What happens when an IO job WITHOUT upsert is run?

  • Zuar Runner sends a SQL query to the source database (input)

  • Zuar Runner truncates the destination database table (output) and loads all the data

What happens when an IO job WITH upsert is run?

  • Zuar Runner queries the destination database table (output) to find the maximum value of the “last modified column”

  • Zuar Runner sends a SQL query to the source database (input) with an added WHERE clause including the value of the maximum “last modified column”.

  • Zuar Runner creates a temporary table in the destination database

  • Zuar Runner merges the data in the temporary table with the destination table using the primary key(s)

Both example above assume the output table already exists. If it doesn’t Zuar Runner will create it.

Configuring a Query Job for Upsert

The JSON config of a Query job configured for upsert has a few additional values. If the Query job is created via the Add Job wizard, these values are auto created and populated. However, the resulting job can be edited, so it is helpful to understand the resulting job configuration.

Query job with upsert

Input

The input section has an additional key/value pair defining the last_modified_column.

"input": {
        "query": [
            "SELECT * from sf.account"
        ],
        "dbo": "postgresql://db/analytics",
        "use": "query.io#QueryInput",
        "last_modified_column": "last_modified_date"
    },

Steps:

The steps section has a few additional steps:

  • MaxTimestamp - This is the step where Zuar Runner queries the “last modified column” (column) in the output table.

  • SetUpdatedAt - This is the step where Zuar Runner adds the WHERE clause to the input query that is sent to the source database.

  • CreateTempTable - This is the step where Zuar Runner creates the temporary table in the destination database (output).

  • SyncTempTable - This is the step where the upsert actually occurs. The temporary table is merged with the output table by using the key(s). In the example below, the primary key is id. key is a list, and therefore the primary key can be a composite key of multiple columns.

"steps": [
        {
            "column": "last_modified_date",
            "use": "mitto.iov2.steps#MaxTimestamp"
        },
        {
            "use": "mitto.iov2.steps.upsert#SetUpdatedAt"
        },
        {
            "transforms": [
                {
                    "use": "mitto.iov2.transform#ExtraColumnsTransform"
                },
                {
                    "use": "mitto.iov2.transform#ColumnsTransform"
                }
            ],

    "use": "mitto.iov2.steps#Input"
        },
        {
            "use": "mitto.iov2.steps#CreateTable"
        },
        {
            "use": "mitto.iov2.steps.upsert#CreateTempTable"
        },
        {
            "transforms": [
                {
                    "use": "mitto.iov2.transform#FlattenTransform"
                }
            ],
            "use": "mitto.iov2.steps#Output"
        },
        {
            "key": [
                "id"
            ],
            "use": "mitto.iov2.steps.upsert#SyncTempTable"
        },
        {
            "use": "mitto.iov2.steps#CountTable"
        }
]

Results of a Query Job with Upsert

After a Query job is run, depending on the job type, you can either view the Run info section on the job page to see how many rows are in the resulting table (count) and how many rows were updated (updates). You can also view the logs of the job run and look for the CollectMeta step:

Upsert Logs