# JSON

Zuar Runner's JSON plugin pipes data from any JSON file into a relational
database. The input JSON files can be either JSON or JSONL. 

.. image:: assets/json-1.png

## Example use case

Let's take a simple JSON file with nested data and convert the data
into tables in a relational database.

### Example JSON file

Here's what the source JSON file (`people_pets.json`) looks like:
```json
[
  {
	"id": 1,
	"name": "Justin",
	"pets": [
	  {
		"id": 1,
		"name": "Bear",
		"pet": {
		  "type": "dog",
		  "breed": "Goldendoodle"
		}
	  },
	  {
		"id": 2,
		"name": "Birdie",
		"pet": {
		  "type": "dog",
		  "breed": "Goldendoodle"
		}
	  }
	]
  },
  {
	"id": 2,
	"name": "Matt",
	"pets": []
  },
  {
	"id": 3,
	"name": "Ben",
	"pets": [
	  {
		"id": 3,
		"name": "Zuca",
		"pet": {
		  "type": "dog",
		  "breed": "Cavapoo"
		}
	  }
	]
  }
]
```
### Expected Relational Database Results

Because the JSON file has nested data in the `pets` array (`[ ]`), we
will need to create two tables in the database. 

- **Table 1** will include top level key/value pairs: `id`,
  `name`. The resulting table should have 3 rows.
- **Table 2** will include data in the `pets` array (`id`, `name`, and
  `pet` object's data). Table 2 will also include data from the
  parent object (`id`) so we can join the two database tables. The
  resulting table should have 3 rows.
    
**Table 1** expected result:

| id | name| 
|----|-----|
| 1 | Justin |
| 2 | Matt |
| 3 | Ben |

**Table 2** expected result:

| id | name | pet__type | pet__breed | person_id |
|----|------|-----------|------------|-----------|
| 1 | Bear | dog | Goldendoodle | 1 |
| 2 | Birdie| dog | Goldendoodle | 1 |
| 3 | Zuca | dog | Cavapoo | 3 |

## Create a Zuar Runner JSON Job

The JSON file(s) must exist in Zuar Runner's file system. Use the [file
manager](/user-interface/files/) to manually add a JSON
file to Zuar Runner.

The JSON input requires creating a job manually, so create a job using
the [Generic](/jobs/generic/) plugin. Set the job's `type`
to `io`.

## Example Zuar Runner Job Configurations

**Table 1** Zuar Runner job configuration:
```json
{
    "input": {
        "use": "flatfile.iov2#JsonInput",
        "source": "/var/mitto/data/people_pets.json"
    },
    "output": {
        "tablename": "people",
        "use": "call:mitto.iov2.db#todb",
        "schema": "json",
        "dbo": "postgresql://db/analytics"
    },
    "steps": [
        {
            "use": "mitto.iov2.steps#Input",
            "transforms": [
                {
                    "use": "mitto.iov2.transform#ExtraColumnsTransform"
                },
                {
                    "use": "mitto.iov2.transform#ColumnsTransform"
                }
            ]
        },
        {
            "use": "mitto.iov2.steps#CreateTable"
        },
        {
            "use": "mitto.iov2.steps#Output",
            "transforms": [
                {
                    "use": "mitto.iov2.transform#FlattenTransform"
                }
            ]
        },
        {
            "use": "mitto.iov2.steps#CollectMeta"
        }
    ]
}
```

This example job take the source JSON file (`people_pets.json`) and
creates a database table (`json.people`) in Zuar Runner's internal
PostgreSQL database (`localhost`). It creates columns from all of the
"top level" keys excluding the arrays (`[ ]`).

**Table 2** Zuar Runner job configuration:
```json
{
    "input": {
        "source": "/var/mitto/data/people_pets.json",
        "use": "flatfile.iov2#JsonInput"
    },
    "output": {
        "dbo": "postgresql://db/analytics",
        "schema": "json",
        "tablename": "people__pets",
        "use": "call:mitto.iov2.db#todb"
    },
    "steps": [
        {
            "use": "mitto.iov2.steps#Input",
            "transforms": [
                {
                    "use": "mitto.iov2.transform#PluckV2Transform",
                    "jpath": "$.pets[*]",
                    "members": [
                        {
                            "name": "person_id",
                            "value": "$.id"
                        }
                    ]
                },
                {
                    "use": "mitto.iov2.transform#ExtraColumnsTransform"
                },
                {
                    "use": "mitto.iov2.transform#ColumnsTransform"
                }
            ]
        },
        {
            "use": "mitto.iov2.steps#CreateTable"
        },
        {
            "use": "mitto.iov2.steps#Output",
            "transforms": [
                {
                    "use": "mitto.iov2.transform#FlattenTransform"
                }
            ]
        },
        {
            "use": "mitto.iov2.steps#CollectMeta"
        }
    ]
}
```

This example job take the source JSON file (`people_pets.json`) and
creates a database table (`json.people__pets`) in Zuar Runner's internal
PostgreSQL database (`localhost`). 

Note that this job configuration includes an additional `transform`, the `PluckV2Transform`. 
This transform takes content from the JSON array and extracts that content
into a separate database table. Based on the `jpath` and `members`
parameters in the `steps`, it creates columns from all the keys inside
the `pets` array and the "top level" `id` key.

### Jpath

The `jpath` parameter let's you define exactly what section of the
source JSON file to pull from. This is necessary for **Table 2**
because of the nested data in the `pets` array. Learn more about
**JSONPath** expressions below.

### Members

The `members` parameter let's you add additional columns from the
source JSON file to the resulting database table. In our case, we need
to add the `id` of each person to the `people__pets` table in order
for us to join the two tables together. `members` also uses jpath in
it's `value` key, and you can name the resulting column by adjusting
the `name` key.

## JSONPath Expressions

To understand how to use JSONPath to pick specific sections of data
out of a JSON object you can visit:
[https://goessner.net/articles/JsonPath/](https://goessner.net/articles/JsonPath/)

To interactively learn how to use JSONPath syntax you can visit:
[https://jsonpath.com/](https://jsonpath.com/)


### RegEx

Like the other flat file types, Runner jobs can use RegEx for the source file
selection. An example of a JSON job `input` section using RegEx is below:

```json
{
  "input": {
    "base": "flatfile.iov2#JsonInput",
    "use": "flatfile.iov2#RegexInput",
    "regex": "/var/mitto/data/people_pets_.*.json$"
  }
```

This example JSON job will read data from an arbitrary collection of JSON files
named in a way that matches the regular expression pattern defined. See [RegEx](/jobs/regex/)
for more information.


### JSONL

Runner also supports JSONL files; these files are commonly used in applications that
deliver `log` type data, or applications that produce a large sets of records.

In JSONL files, each line is a JSON record itself, followed by a specific non-printing
character used to delimit the lines.

The `people_pets.json` sample data can be represented as JSONL with some changes; it would have this
content:

```json
{"id":1,"name":"Justin","pets":[{"id":1,"name":"Bear","pet":{"type":"dog","breed":"Goldendoodle"}},{"id":2,"name":"Birdie","pet":{"type":"dog","breed":"Goldendoodle"}}]}
{"id":2,"name":"Matt","pets":[]}
{"id":3,"name":"Ben","pets":[{"id":3,"name":"Zuca","pet":{"type":"dog","breed":"Cavapoo"}}]}
```

If we need to process JSONL files, all of the previous discussion is applicable, with one modification to `input` section of the job, changing the `use` directive to `JsonlInput`:

```json
{
    "input": {
        "source": "/var/mitto/data/people_pets.json",
        "use": "flatfile.iov2#JsonlInput"
    },
```