Inner Join¶
The general structure of the Inner Join core operation in Graphene is:
{
"_schema": "InnerJoin_1.0",
"join_record_type": "JoinType",
"join_on": ["OccurrenceKey"],
"op": "MULTIPLY"
}
Parameters¶
The parameters are defined as follows:
Parameter Name |
Required |
Type |
Description |
---|---|---|---|
|
Yes |
|
Record type that will be used to identify joining records. All records that don’t have join_record_type will be considered financial records that might be propagated to the final output depending on the join results. |
|
Yes |
|
Ledger attribute name(s) that will be used to join financial records and joining records. |
|
No |
|
Optional operator to be used with the join record values. Allowed values are |
The first element in join_on
must be Trial
, Time
, or the OccurrenceKey
attribute name.
The remaining elements can be zero or more metadata attributes.
Behavior¶
Inner Join, like all Graphene nodes, receives one merged input stream, and so uses join_record_type
to distinguish
between what might be called “left” and “right” in other system’s joins. Here, all records with join_record_type
are considered “joining records”, and all other records are “financial records”.
All financial records that have corresponding joining records are propagated to the output ledger. Joining records are used
to determine which financial records will be propagated, but are not themselves propagated to the output ledger.
Financial records that don’t have corresponding joining record(s) are not propagated to the output ledger.
If there are N financial record and M corresponding records with join_record_type
, then NxM financial records will be propagated.
Corresponding records are determined by matching values for the attributes specified in join_on
.
When op
is set to MULTIPLY
the financial records values are multiplied by the values of the corresponding joining records.
If op
is not defined, then the propagated financial records values are not modified, and values of join_record_type
records are ignored.
Example¶
Assuming we have two input ledger with Losses and factors:
Trial |
Time |
Type |
Value |
EventId |
LOB |
GID |
---|---|---|---|---|---|---|
1 |
1 |
Loss |
100 |
1 |
A |
1 |
1 |
2 |
Loss |
300 |
2 |
A |
2 |
1 |
2 |
Loss |
400 |
2 |
B |
3 |
Input ledger with join_record_type
records
Trial |
Time |
Type |
Value |
EventId |
LOB |
GID |
---|---|---|---|---|---|---|
1 |
1 |
Factor |
1.1 |
1 |
A |
1 |
1 |
2 |
Factor |
1.2 |
2 |
A |
2 |
1 |
2 |
Factor |
1.3 |
2 |
A |
3 |
If we use the InnerJoin template with the following attributes:
{
"_schema": "InnerJoin_1.0",
"join_record_type": "Factor",
"join_on": ["OccurrenceKey", "LOB"],
"op": "MULTIPLY"
}
We will get the following output:
Trial |
Time |
Type |
Value |
EventId |
LOB |
GID |
---|---|---|---|---|---|---|
1 |
1 |
Loss |
110 |
1 |
A |
1 |
1 |
2 |
Loss |
360 |
2 |
A |
2 |
1 |
2 |
Loss |
390 |
2 |
A |
2 |
Example notes:
Financial record with EventId 1 and LOB A has one joining record with value 1.1, therefore output value is 100 * 1.1.
Financial record with EventId 2 and LOB A has 2 joining records with values 1.2 and 1.3, therefore there are two records in the output with values 300*1.2 and 300*1.3.
Financial record with EventId 2 and LOB B doesn’t have any joining records with type Factor, therefore it is not propagated to the output.
If instead we use the InnerJoin template with no operation:
{
"_schema": "InnerJoin_1.0",
"join_record_type": "Factor",
"join_on": ["OccurrenceKey", "LOB"],
}
We will get the following output:
Trial |
Time |
Type |
Value |
EventId |
LOB |
GID |
---|---|---|---|---|---|---|
1 |
1 |
Loss |
100 |
1 |
A |
1 |
1 |
2 |
Loss |
300 |
2 |
A |
2 |
1 |
2 |
Loss |
300 |
2 |
A |
2 |