.. index:: Loss Set ELT

Loss Set ELT
############################
A ELT Loss Set is a one of the options for doing any calculations involving Event Loss Tables using the Graphene system.
Defining multiple ELT Loss Sets for analysis is supported (and common).

Structure
*********
The general structure of an ELT Loss Set in Graphene is:

.. code-block:: json

    {
        "_schema": "LossSetELT_1.0",
        "paths": ["s3://example_bucket/uploads/ledgers/example_ledger_upload/losses.parquet",
                  "s3://example_bucket/uploads/ledgers/example_ledger_upload/losses2.parquet"],
        "currency": "USD",
        "model": {
            "frequency": { "distribution_type": "POISSON" },
            "seasonality": { "min_time": 0, "max_time": 365.25 },
            "seed": false
        }
    }

Another example:

.. code-block:: json

    {
        "_schema": "LossSetELT_1.0",
        "paths": ["s3://example_bucket/uploads/ledgers/example_ledger_upload/losses.parquet"],
        "currency": "USD",
        "model": {
            "frequency": { "distribution_type": "BINOMIAL", "index_of_dispersion": 0.5 },
            "seasonality": { "pairs": [ [1, 0], [31, 0.9], [100, 0.1] ], "subtype": "NON_CUMULATIVE", "interpolation": false },
            "seed": true
        }
    }


Parameters
**********
The parameters are defined as follows:

+---------------------------+----------+------------+-----------------------------------------------------------------------------------------------------------------------------------------------------------------------+
|           Parameter Name  | Required |    Type    |                                                                                                 Description                                                           |
+===========================+==========+============+=======================================================================================================================================================================+
| ``paths``                 | Yes      | ``array``  | Array of the unescaped S3 key prefix or full S3 key that represents a complete ELT loss data set. This path must be absolute.                                         |
+---------------------------+----------+------------+-----------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| ``currency``              | No       | ``string`` | The currency in which the input currency values are defined. Defaults to the :ref:`base currency <base-currency>` if not set.                                         |
+---------------------------+----------+------------+-----------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| ``model``                 | Yes      | ``object`` | The model associated with the ELT data. Object is complex, so more details can be found below.                                                                        |
+---------------------------+----------+------------+-----------------------------------------------------------------------------------------------------------------------------------------------------------------------+

.. note:: Avoid S3 keys containing special characters as described in the `S3 User Guide <https://docs.aws.amazon.com/AmazonS3/latest/userguide/object-keys.html>`_
    with the exception of delimiting ``/`` characters.


Model Parameters:
******************
+---------------------------+----------+------------+-----------------------------------------------------------------------------------------------------------------------------------------------------------------------+
|           Parameter Name  | Required |    Type    |                                                                                                 Description                                                           |
+===========================+==========+============+=======================================================================================================================================================================+
| ``seasonality``           | Yes      | ``object`` | Seasonality definition for the `model`. Object is complex, so more details can be found below.                                                                        |
+---------------------------+----------+------------+-----------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| ``frequency``             | No       | ``object`` | Frequency definition for the `model`.  Object is complex, so more details can be found below.                                                                         |
+---------------------------+----------+------------+-----------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| ``seed``                  | No       | ``boolean``| If a seed value should be calculated and applied to the model.  This will ensure values are the same for multiple runs.  Default is ``true``.                         |
+---------------------------+----------+------------+-----------------------------------------------------------------------------------------------------------------------------------------------------------------------+

Seasonality Parameters:
***********************
+---------------------------+----------+-------------+----------------------------------------------------------------------------------------------------------------------------------------------------------------------+
|           Parameter Name  | Required |    Type     |                                                                                                 Description                                                          |
+===========================+==========+=============+======================================================================================================================================================================+
| ``pairs``                 | No       | ``array``   | Array of tuples in the form (Time, Probability). When supplied, implies `Empirical` seasonality type. If not provided, then seasonality type is `Uniform`.           |
+---------------------------+----------+-------------+----------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| ``subtype``               | No       | ``string``  | Used in combination with ``pairs``. Type of probability values in `pairs`. `CUMULATIVE` or `NON_CUMULATIVE` (default).                                               |
+---------------------------+----------+-------------+----------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| ``interpolation``         | No       | ``boolean`` | Used in combination with ``pairs``. If ``true``, sampling will use linear interpolation between the provided time values. Default is ``false``.                      |
+---------------------------+----------+-------------+----------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| ``min_time``              | No       | ``number``  | First day of simulated events. Is only required if ``pairs`` are not defined.                                                                                        |
+---------------------------+----------+-------------+----------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| ``max_time``              | No       | ``number``  | Last day of simulated events. Is only required if ``pairs`` are not defined.                                                                                         |
+---------------------------+----------+-------------+----------------------------------------------------------------------------------------------------------------------------------------------------------------------+

.. note::
    If you are using `Decimal` timestamps with 365.25-value handling for leap years and a `Uniform` seasonality distribution,
    and want a full year in the simulation, then `min_time` should be 0 and `max_time` should be set to 365.25.
    When using `Posix` please refer to the :ref:`timestamps` documentation for Ledgers.
    Either `pairs` or `min_time` and `max_time` parameters MUST be provided.

Sets of pairs in ``pairs`` describe the distribution of sampled time values over a defined range. ``interpolation`` impacts how sampled time values are distributed over the defined ranges in ``pairs``.
For pairs :math:`p_{n}, p_{n+1}` when ``interpolation`` is off the likelihood sampled time values will be :math:`T_{n}` is :math:`P_{n}` and the likelihood they will be :math:`T_{n+1}` is :math:`P_{n+1}`. When
``interpolation`` is on the likelihood sampled time values will be :math:`T_{n}` is :math:`P_{n}` and the likelihood they will be uniformly distributed over :math:`(T_{n}, T_{n+1}]` is :math:`P_{n+1}`.

The distribution for seasonality should be continuous. To ensure the distribution is continuous the first pair in ``pairs`` must have a probability of 0 and time values cannot be repeated.

``subtype`` determines how probabilities are represented in ``pairs``. When the ``subtype`` is ``CUMULATIVE`` a pair's probability describes the likelihood that the sampled time value will be equal or
less than the pair's time value. The probabilities in ``pairs`` for ``CUMULATIVE`` must be non-decreasing and the last probability must be equal to 1. When the ``subtype`` is ``NON_CUMULATIVE`` a pair's
probability describes the likelihood that a sampled time value will be the pair's time value. The probabilities in ``pairs`` for ``NON_CUMULATIVE`` sum must equal 1.

.. note::
    ``pairs`` will be sorted by ascending Time value.

Example 1:

.. code-block:: json

    "seasonality": { "pairs": [ [0, 0.0], [1, 0.1], [31, 0.9] ] }

Sampled time values will have a 10% chance of being ``1``, and a 90% chance of being ``31``. If ``interpolation`` was
set to ``true``, then there would be a 10% chance of ``(0, 1]``, and 90% chance of a value sampled from a uniform distribution
in the range ``(1, 31]``.

Example 2-A:

.. code-block:: json

    "seasonality": {
        "pairs": [
            [0, 0],
            [31, 0.5],
            [180, 0.0],
            [225, 0.2],
            [365.25, 0.3]
        ],
        "interpolation": false
    }

.. plot::

   import matplotlib.pyplot as plt
   plt.plot([0, 31, 31, 225, 225, 365.25, 365.25],
            [0.0, 0.0, 0.5, 0.5, 0.7, 0.7, 1.0])
   plt.xlabel('Time')
   plt.ylabel('Cumulative Probability')

Example 2-B:

The same as 2-A, but with interpolation.

.. code-block:: json

    "seasonality": {
        "pairs": [
            [0, 0.0],
            [31, 0.5],
            [180, 0.0],
            [225, 0.2],
            [365.25, 0.3]
        ],
        "interpolation": true
    }

.. plot::

   import matplotlib.pyplot as plt
   plt.plot([0, 31, 180, 225, 365.25],
            [0.0, 0.5, 0.5, 0.7, 1.0])
   plt.xlabel('Time')
   plt.ylabel('Cumulative Probability')


Frequency Parameters:
*********************
+---------------------------+----------+------------+-----------------------------------------------------------------------------------------------------------------------------------------------------------------------+
|           Parameter Name  | Required |    Type    |                                                                                                 Description                                                           |
+===========================+==========+============+=======================================================================================================================================================================+
| ``distribution_type``     | No       | ``string`` | Can be either `POISSON`, `BINOMIAL` or `NEGATIVE_BINOMIAL`.  Default is `POISSON`.                                                                                    |
+---------------------------+----------+------------+-----------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| ``index_of_dispersion``   | No       | ``number`` | `Variances`/`mean`. Required for value for `BINOMIAL` or `NEGATIVE_BINOMIAL` calculations. Must be < 1 for `BINOMIAL` and > 1 for `NEGATIVE_BINOMIAL` distributions.  |
+---------------------------+----------+------------+-----------------------------------------------------------------------------------------------------------------------------------------------------------------------+

.. note:: FindNode Behaviour

    It is important to understand that if you create an ELT template using some of the default values, they will not be
    populated until analysis time.  Thus, if you want to search for a node using findnode, you must ensure that you add
    a search for an empty value.

    For example, if you want to search for a model with a frequency distribution type of 'POISSON', you should structure
    the query as follows:

        ``!? model.frequency.distribution_type | model.frequency.distribution_type == "POISSON"``


Event Loss Table Data Storage
*****************************
See :doc:`ELT Format</elt_format>` for the ELT input requirements.

Frequency Distribution Specification
************************************
For ELTs we offer the ability to partially parameterize the Poisson, Binomial and Negative Binomial distributions by
providing an index of dispersion for the latter two. The index of dispersion is the ratio of variance to mean and allows
us to exploit the additive properties of these three distribution types. More specifically:

.. math::

  & Poisson\\
  & X_{i} \sim Pois(\mu_{x}) \\
  & Y_{i} \sim Pois(\mu_{y}) \\
  & X_{i} + Y_{i} \sim Pois(\mu_{y})\\
  & \\
  & Binomial\\
  & X_{i} \sim B(n,p)\quad where\ \mu = np\\
  & Y_{i} \sim B(m,p)\quad where\ \mu = mp\\
  & X_{i} + Y_{i} \sim B(n+m,p)\quad where\ \mu = (m+n)p\\
  & \\
  & NegativeBinomial\\
  & X_{i} \sim NB(n,p)\quad where\ \mu = \frac{n(1-p)}{p}\\
  & Y_{i} \sim B(m,p)\quad where\ \mu = \frac{m(1-p)}{p}\\
  & X_{i} + Y_{i} \sim NB(n+m,p)\ where\ \mu = \frac{(m+n)(1-p)}{p}

When related to ELTs and models, the mean of the frequency distribution :math:`\mu` is the sum of the rates in an event set.
Thus, the mean is known based on the rates extracted from the ELT data that is attached to a model.
For the negative binomial and binomial models the additivity of the events sets (and thus allowing us to incrementally
build event sets from ELTs) depends on assuming a consistent index of dispersion. Given those, we can compute the
negative binomial and binomial parameters.

.. math::

  & Binomial\\
  & \mu = \sum (rates\ in\ eventset)\\
  & D = provided\ index\ of\ dispersion\\
  & p = 1-D\\
  & n = \frac{\mu}{1-D}\\
  & \\
  & NegativeBinomial\\
  & \mu = \sum (rates\ in\ eventset)\\
  & D = provided\ index\ of\ dispersion\\
  & p = \frac{1}{D}\\
  & n = \frac{\mu}{D-1}

Multiple ELT Data files in a Model
**********************************
If you create a network with ELT template as input, you need to decide upfront if changes in the ELT template node will
affect this network.  If you decide that updating properties on the template node in the future, such as paths, is going
to affect your calculation on this network, then you can add this template node without fixing the revision number.  Using
a fixed revision number will make the analysis results consistant even if you update the template node later.

To be able to build simulations incrementally a model can hold onto multiple event data sets.  This is the 'paths'
parameter in the template. Order of data files in the `paths` parameter matters, if the latest files contain definitions
of events with the same IDs as the previous files, they will be ignored.  This will generate a warning from the system.

When you add a new Event Loss Table (ELT) data to an existing model, the system goes through a process that involves
difference calculation and identifying unique events. Here's a detailed explanation of how this process occurs:

* When you add a new Event Loss Table (ELT) to an existing `paths` property, the system performs a difference calculation and identifies unique events within the new ELT based on event_ids
* The system compares the events in the new ELT with the events already present in the previous file.
* New events that have the same id as "existing" ones and are not duplicated. (they will be ignored, as stated above, and a warning will trigger)
* Events from the new ELT that do not match any existing events are tagged as "new” and a new event set is created for
  the model.

.. note:: S3 Path

    Information on :ref:`S3_paths` format and special characters can be found in the Loss Set documentation.