JSON¶
Zuar Runner’s JSON plugin pipes data from any JSON file into a relational database. The input JSON files can be either JSON or JSONL.
Example use case¶
Let’s take a simple JSON file with nested data and convert the data into tables in a relational database.
Example JSON file¶
Here’s what the source JSON file (people_pets.json
) looks like:
[
{
"id": 1,
"name": "Justin",
"pets": [
{
"id": 1,
"name": "Bear",
"pet": {
"type": "dog",
"breed": "Goldendoodle"
}
},
{
"id": 2,
"name": "Birdie",
"pet": {
"type": "dog",
"breed": "Goldendoodle"
}
}
]
},
{
"id": 2,
"name": "Matt",
"pets": []
},
{
"id": 3,
"name": "Ben",
"pets": [
{
"id": 3,
"name": "Zuca",
"pet": {
"type": "dog",
"breed": "Cavapoo"
}
}
]
}
]
Expected Relational Database Results¶
Because the JSON file has nested data in the pets
array ([ ]
), we
will need to create two tables in the database.
Table 1 will include top level key/value pairs:
id
,name
. The resulting table should have 3 rows.Table 2 will include data in the
pets
array (id
,name
, andpet
object’s data). Table 2 will also include data from the parent object (id
) so we can join the two database tables. The resulting table should have 3 rows.
Table 1 expected result:
id |
name |
---|---|
1 |
Justin |
2 |
Matt |
3 |
Ben |
Table 2 expected result:
id |
name |
pet__type |
pet__breed |
person_id |
---|---|---|---|---|
1 |
Bear |
dog |
Goldendoodle |
1 |
2 |
Birdie |
dog |
Goldendoodle |
1 |
3 |
Zuca |
dog |
Cavapoo |
3 |
Create a Zuar Runner JSON Job¶
The JSON file(s) must exist in Zuar Runner’s file system. Use the file manager to manually add a JSON file to Zuar Runner.
The JSON input requires creating a job manually, so create a job using
the Generic plugin. Set the job’s type
to io
.
Example Zuar Runner Job Configurations¶
Table 1 Zuar Runner job configuration:
{
"input": {
"use": "flatfile.iov2#JsonInput",
"source": "/var/mitto/data/people_pets.json"
},
"output": {
"tablename": "people",
"use": "call:mitto.iov2.db#todb",
"schema": "json",
"dbo": "postgresql://db/analytics"
},
"steps": [
{
"use": "mitto.iov2.steps#Input",
"transforms": [
{
"use": "mitto.iov2.transform#ExtraColumnsTransform"
},
{
"use": "mitto.iov2.transform#ColumnsTransform"
}
]
},
{
"use": "mitto.iov2.steps#CreateTable"
},
{
"use": "mitto.iov2.steps#Output",
"transforms": [
{
"use": "mitto.iov2.transform#FlattenTransform"
}
]
},
{
"use": "mitto.iov2.steps#CollectMeta"
}
]
}
This example job take the source JSON file (people_pets.json
) and
creates a database table (json.people
) in Zuar Runner’s internal
PostgreSQL database (localhost
). It creates columns from all of the
“top level” keys excluding the arrays ([ ]
).
Table 2 Zuar Runner job configuration:
{
"input": {
"source": "/var/mitto/data/people_pets.json",
"use": "flatfile.iov2#JsonInput"
},
"output": {
"dbo": "postgresql://db/analytics",
"schema": "json",
"tablename": "people__pets",
"use": "call:mitto.iov2.db#todb"
},
"steps": [
{
"use": "mitto.iov2.steps#Input",
"transforms": [
{
"use": "mitto.iov2.transform#PluckV2Transform",
"jpath": "$.pets[*]",
"members": [
{
"name": "person_id",
"value": "$.id"
}
]
},
{
"use": "mitto.iov2.transform#ExtraColumnsTransform"
},
{
"use": "mitto.iov2.transform#ColumnsTransform"
}
]
},
{
"use": "mitto.iov2.steps#CreateTable"
},
{
"use": "mitto.iov2.steps#Output",
"transforms": [
{
"use": "mitto.iov2.transform#FlattenTransform"
}
]
},
{
"use": "mitto.iov2.steps#CollectMeta"
}
]
}
This example job take the source JSON file (people_pets.json
) and
creates a database table (json.people__pets
) in Zuar Runner’s internal
PostgreSQL database (localhost
).
Note that this job configuration includes an additional transform
, the PluckV2Transform
.
This transform takes content from the JSON array and extracts that content
into a separate database table. Based on the jpath
and members
parameters in the steps
, it creates columns from all the keys inside
the pets
array and the “top level” id
key.
Jpath¶
The jpath
parameter let’s you define exactly what section of the
source JSON file to pull from. This is necessary for Table 2
because of the nested data in the pets
array. Learn more about
JSONPath expressions below.
Members¶
The members
parameter let’s you add additional columns from the
source JSON file to the resulting database table. In our case, we need
to add the id
of each person to the people__pets
table in order
for us to join the two tables together. members
also uses jpath in
it’s value
key, and you can name the resulting column by adjusting
the name
key.
JSONPath Expressions¶
To understand how to use JSONPath to pick specific sections of data out of a JSON object you can visit: https://goessner.net/articles/JsonPath/
To interactively learn how to use JSONPath syntax you can visit: https://jsonpath.com/
RegEx¶
Like the other flat file types, Runner jobs can use RegEx for the source file
selection. An example of a JSON job input
section using RegEx is below:
{
"input": {
"base": "flatfile.iov2#JsonInput",
"use": "flatfile.iov2#RegexInput",
"regex": "/var/mitto/data/people_pets_.*.json$"
}
This example JSON job will read data from an arbitrary collection of JSON files named in a way that matches the regular expression pattern defined. See RegEx for more information.
JSONL¶
Runner also supports JSONL files; these files are commonly used in applications that
deliver log
type data, or applications that produce a large sets of records.
In JSONL files, each line is a JSON record itself, followed by a specific non-printing character used to delimit the lines.
The people_pets.json
sample data can be represented as JSONL with some changes; it would have this
content:
{"id":1,"name":"Justin","pets":[{"id":1,"name":"Bear","pet":{"type":"dog","breed":"Goldendoodle"}},{"id":2,"name":"Birdie","pet":{"type":"dog","breed":"Goldendoodle"}}]}
{"id":2,"name":"Matt","pets":[]}
{"id":3,"name":"Ben","pets":[{"id":3,"name":"Zuca","pet":{"type":"dog","breed":"Cavapoo"}}]}
If we need to process JSONL files, all of the previous discussion is applicable, with one modification to input
section of the job, changing the use
directive to JsonlInput
:
{
"input": {
"source": "/var/mitto/data/people_pets.json",
"use": "flatfile.iov2#JsonlInput"
},