Store Input¶
Once raw source data is stored in a Zuar Runner store, the Zuar Runner store itself can become a source (input) in an IO job.
The primary use case for this is to create several related relational database tables from a single nested data source (API data, JSON, etc).
Example Source Data¶
Let’s use this accounting JSON data as an example:
[
{
"amount": 500.00,
"customer": {
"id": 1,
"name": "Customer 1"
},
"id": 1,
"invoice_lines": [
{
"amount": 200.00,
"id": 1,
"product": "A"
},
{
"amount": 300.00,
"id": 2,
"product": "B"
}
],
"invoice_number": 1001
},
{
"amount": 240.00,
"customer": {
"id": 1,
"name": "Customer 2"
},
"id": 2,
"invoice_lines": [
{
"amount": 200.00,
"id": 1,
"product": "A"
},
{
"amount": 40.00,
"id": 2,
"product": "C"
}
],
"invoice_number": 1002
}
]
This dataset has two arrays [ ]
:
The data set as a whole is an array
The
invoice_lines
data contains an array
So, Zuar Runner needs to split this data into two relational database tables.
id |
invoice_number |
amount |
cusomer_id |
customer_name |
---|---|---|---|---|
1 |
1001 |
500.00 |
1 |
Customer 1 |
2 |
1002 |
240.00 |
2 |
Customer 2 |
Table 1, can be created using the Zuar Runner JSON plugin.
Job #1 name: json_invoices
invoice_id |
id |
line |
product |
amount |
---|---|---|---|---|
1001 |
1 |
1 |
A |
200.00 |
1001 |
2 |
2 |
B |
300.00 |
1002 |
3 |
1 |
A |
200.00 |
1002 |
4 |
2 |
C |
40.00 |
If Zuar Runner job #1 uses a Zuar Runner store, then Table 2, can be created using a Zuar Runner Store job.
Zuar Runner Store Job Config¶
Zuar Runner store jobs can be created using the Generic plugin.
Here’s an example job config that pipes data from a Zuar Runner store.
{
"input": {
"name": "json_invoices",
"use": "mitto.iov2.input#StoreInput",
"jpath": "$.invoice_lines[*]",
"members": [
{
"name": "invoice_id",
"value": "$.id"
}
]
},
"output": {
"tablename": "invoice_lines",
"use": "call:mitto.iov2.db#todb",
"schema": "json",
"dbo": "postgresql://db/analytics"
},
"steps": [
{
"use": "mitto.iov2.steps#Input",
"transforms": [
{
"use": "mitto.iov2.transform#ExtraColumnsTransform"
},
{
"use": "mitto.iov2.transform#ColumnsTransform"
}
]
},
{
"use": "mitto.iov2.steps#CreateTable"
},
{
"use": "mitto.iov2.steps#Output",
"transforms": [
{
"use": "mitto.iov2.transform#FlattenTransform"
}
]
},
{
"use": "mitto.iov2.steps#CollectMeta"
}
]
}
The input
section is the important part of this job config.
Name¶
The name
parameter is the name of the “parent” job that is configured to use a store.
Jpath¶
The jpath
parameter let’s you define exactly what section of the
store’s JSON data to pull from. This is necessary for Table 2
because of the nested data in the invoice_lines
array. Learn more
about JSONPath (jpath) expressions below.
Query a record from the “parent” job’s Zuar Runner store
API to
determine the correct jpath
.
Members¶
The members
parameter let’s you add additional columns from any
section of the store’s JSON data to the resulting database table. In
our case, we need to add the id
of each invoice to the
invoice_lines
table in order for us to join the two tables
together. members
also uses jpath in it’s value
key, and you can
name the resulting column by adjusting the name
key.
JSONPath Expressions¶
To understand how to use JSONPath to pick specific sections of data out of a JSON object you can visit: https://goessner.net/articles/JsonPath/
To interactively create a JSON path, you can paste JSON into this GUI and select fields until you obtain your desired level of detail: https://jsonpathfinder.com/. This tool will help build you path.
To interactively learn or test how to use JSONPath syntax you can visit: https://jsonpath.com/
Best Practices¶
Always run the store job after the “parent” job because the store job requires the store from the “parent” job.