Architecture ===================== While almost all end-user interaction with triangle data can be accomplished with public methods available on ``Triangle`` objects, it helps to understand major elements in the internal structure of ``Triangle``. ``Cell`` ------------ The basic building block class in Bermuda triangles is the ``Cell``. There are three types of cells: :code:`Cell` and its subclasses :code:`CumulativeCell` and :code:`IncrementalCell`. All cells consists of an experience period start date and end date, a development lag, one or more observed values, and a :code:`Metadata` object. A loose representation of a single cell may look something like this: - **Experience Start Date**: ``2017-07-01`` - **Experience End Date**: ``2017-07-31`` - **Evaluation Date**: ``2018-10-31`` (development lag of 15 months) - **Metadata**: - **Country**: US - **Currency**: USD - **Risk Basis**: Accident - **Reinsurance Basis**: Gross - **Per Occurrence Limit**: $1M - **Loss Definition**: Loss+DCC - **Details**: - **State**: Texas - **Coverage**: Bodily Injury - **Values**: - **Paid Loss**: $1,234,567 - **Reported Loss**: $2,345,678 - **Earned Premium**: $3,456,789 If this data structure looks very similar to a single row of a tabular triangle, that’s not by coincidence. The structure of cells intentionally mirrors tabular triangle rows. We refer to individual observed values within a cell as “fields”. In the example above, the fields are paid loss, reported loss, and earned premium. :code:`Cell.values` are implemented as a Python dictionary, so there are essentially no restrictions on the fields that can be stored within an observation. In the example above, all of the fields are amounts of money, but we could just as easily have included reported claim counts, closed claim counts, or any number of other fields. Furthermore, since each cell’s values are independent, there is no requirement that all observations have same set of fields. If paid loss is present in every cell, but earned premium is only present in some cell, that’s not a problem. Every observation contains a set of metadata associated with it, including items such as the country the risk is in, the currency that loss and premium amounts are denominated in, whether the exposure period is on accident-basis or policy-basis, and so forth. The set of attributes is extensible via the :code:`details` dictionary attribute. The example above shows state and coverage, but this is just for illustrative purposes; state and coverage are not required members of the details field, and any other arbitrary attributes can be added to the details field if relevant. We track all of this data at the cell level because it can be critical for appropriate modeling when mixing data from several different sources. For example, it would obviously be inappropriate to fit a combined loss development model with volume-weighting to data from two different portfolios, one of which is measured in dollars and the other in yen, without converting currencies first! Similarly, mixing accident-basis and policy-basis experience periods is usually a recipe for disaster, unless special mitigation measures are taken. With that being said, the metadata on each observation is not burdensome for end-users. If an end-user doesn’t care about one or more metadata fields in a given analytical context, they can simply omit them and Bermuda will gracefully supply sensible defaults. Triangles --------- Collections of ``Cells`` are aggregated into a ``Triangle``. From the end-user’s perspective, a ``Triangle`` is an undifferentiated agglomeration of ``Cell``\ s. Under the hood, the ``Triangle`` class indexes and groups cells by common metadata attributes; we refer to these internal groups of ``Cell``\ s as “slices”. A slice consists of a list of ``Cell``\ s, all of which pertain to the same logical group of exposures. For example, a slice may contain cells for the accident-month triangle for Company X, or for the policy-quarter triangle for private passenger bodily injury claims in the state of Missouri for Product Y written by Company Z. Slice grouping is automatically determined based on the metadata associated with each cell. Operations on Triangles ----------------------- We stated earlier that one of the design goals of ``bermuda.Triangle`` is ergonomics. To that end, triangles include a rich set of operations out of the box. We summarize some of the most common operations below. In general, when we have a choice between implementing a behavior as a function or as a method, we prefer the method in almost all cases. There are a few reasons for this. First, all methods on triangles are non-mutating/non-destructive, so there’s no semantic distinction between functions and methods. Second, it tends to be easier and more natural to express a sequence of operations on a triangle as a sequence of chained method calls than as a nested sequence of function calls. Finally, from a rhetorical perspective, we think of ``Triangle`` objects as having a convenient and tidy namespace for holding operations on triangular data, so we don’t have to import functions from another namespace or qualify the function names. Operators --------- - **Equality**: The ``==`` operator on triangles returns ``True`` if the contents of both operands are identical (not if the two operands are references to the same object, as the default behavior for Python objects). - **Concatenation**: The ``+`` operator on two triangles returns a single triangle with the concatenated contents of the two operands. Properties ---------- Any given triangle ``triangle`` has the following basic properties: - ``triangle.slices`` returns a dictionary of slices contained in the triangle. - ``triangle.cells`` returns a list of all cells in the triangle. - ``triangle.periods`` is the sorted list of all distinct experience periods in the triangle. - ``triangle.dev_lags()`` is the sorted list of all distinct development lags in the triangle. ``dev_lag`` accepts ``unit`` as a keyword argument that can be ``month``, ``day`` or ``timedelta``. - ``triangle.evaluation_dates`` is the sorted list of all distinct evaluation dates in the triangle. - ``triangle.evaluation_date`` is the latest evaluation date in the triangle. - ``triangle.fields`` is the sorted list of all distinct fields in cells in the triangle. - ``triangle.metadata`` is the sorted list of all distinct metadata in the triangle. - ``triangle.common_metadata`` returns a single metadata element common to all cells in the triangle. - ``triangle.metadata_differences`` returns a list of unique metadata in the triangle, that are not in ``triangle.common_metadata``. Triangles also implement several higher-order properties. For explanation of Bermuda-specific triangle terminology, see the discussion on triangle philosophy and terminology. - ``triangle.is_empty`` returns ``True`` if there are no cells in the triangle, and ``False`` otherwise. - ``triangle.is_disjoint`` returns ``True`` if all experience periods in the triangle are disjoint, and ``False`` if the triangle is erratic. - ``triangle.is_semi_regular`` tests whether the triangle is semi-regular. - ``triangle.is_regular`` tests whether the triangle is regular. - ``triangle.has_consistent_currency`` and ``triangle.has_consistent_risk_basis`` test whether every cell in the triangle has the same currency or risk basis, respectively. These two pieces of metadata are the most common showstoppers for invalidating a modeling approach. - ``triangle.is_incremental`` returns ``True`` if the triangle is incremental, otherwise ``False``. Basic Mutators -------------- Triangles have the following methods that return modified triangles: - ``triangle.select()`` accepts a list of field names. For each cell in the triangle, any fields that are not in the supplied list of names are removed from the cell’s set of values. If any cells don’t have any values in the list, then those cells are removed entirely. - ``triangle.clip()`` filters a triangle based on cutoff dates. For example, ``triangle.clip(max_eval=datetime.date(2018, 12, 31))`` removes all cells with an evaluation date after December 31st, 2018. ``clip`` accepts the keyword arguments ``min_eval``, ``max_eval``, ``min_period``, ``max_period``, ``min_dev``, ``max_dev``, and ``dev_lag_unit``. Multiple arguments can be supplied – if so, only those cells that satisfy all supplied conditions are returned. - ``triangle.right_edge`` returns the rightmost edge of the triangle – i.e., for each distinct experience period within each slice, the cell with the latest evaluation date is retained and all other cells are dropped. Representations --------------- - ``triangle.to_data_frame()`` returns a ``pandas.DataFrame`` representation of a triangle, for ease of graphing, exporting, and ad hoc manipulation. There are the I/O functions ``triangle.to_long_data_frame`` and ``triangle.to_wide_data_frame`` used for transforming triangles to, and from, wide and long CSVs, respectively. Similarly, there are ``triangle.to_json()`` and ``triangle.to_binary()`` output functions. - ``triangle._repr_html_()`` provides a friendlier rich-HTML representation of a triangle for use in Jupyter notebooks. Intermediate Mutators --------------------- Some triangle mutators require direct manipulation of individual cells. Cells are fairly straightforward to work with, so this does not pose too much of an obstacle. An individual cell ``cell``\ ’s experience period start, experience period end, and development lag can be accessed via ``cell.period_start``, ``cell.period_end``, and ``cell.dev_lag``. The internal cell representation of these values may be unintuitive, so be warned. We can access individual fields within cells as (for example) ``cell.values["paid_loss"]``, or just ``cell["paid_loss"]`` for short. - ``triangle.filter()`` allows for filtering of triangles based on arbitrary cell-level predicates. For example, ``triangle.filter(lambda cel: cel["paid_loss"] > 0)`` removes all cells with zero (or negative) paid loss. The predicate function passed to ``filter`` must take a single argument (a single cell), and the predicate is then applied to every cell in the triangle, one by one. This means ``filter`` cannot be used to express conditions that depend on multiple cells. - ``triangle.derive_fields()`` allows for adding new fields to cells that are transformations of existing cells. For example, ``triangle.derive_fields(paid_LR=lambda ob: ob["paid_loss"] / ob["earned_premium"])`` would add a new field ``paid_LR`` to every observation that contains the paid loss ratio according to the definition provided. ``derive_fields`` can also be used to overwrite existing fields. - ``triangle.aggregate()`` allows for aggregation of a triangle's experience period or evaluation date resolution, such as turning quarterly triangles into annual triangles. - ``triangle.summarize()`` allows for turning multi-slice triangles into a smaller number of triangles that share common metadata. Bermuda will automatically work out the greatest-common-denominator of ``Metadata`` objects in the triangle, and will try to combine fields using default aggregation functions for commonly-used fields (e.g. ``paid_loss``, ``reported_loss``, ``earned_premium`` etc.). Alternatively, users can pass in their own set of summarization functions. - ``triangle.blend()`` allows for the blending of multiple triangles with the same cell fields using either a linear weighted average, or a 'mixture blend' that samples randomly from the different triangle fields according to weights passed in by the user. This is particularly useful if your triangle holds samples from upstream stochastic modelling. - ``triangle.split()`` splits triangles by metadata attributes. For instance, if your triangle holds multiple lines of business triangles, you can split by the line of business metadata identifying attribute to obtain a dictionary of separate triangles. - ``triangle.merge()`` offers triangle cell value joining functionality, where the ``join_type`` argument can be used to specify full, inner, left, right, left-anti or right-anti joining operations. - ``triangle.coalesce()`` is similar to merge, but can take more than two triangles as input, where earlier triangles' cell fields take precedence over later triangles' cell fields. This is similar to an iterated left-join on multiple triangles. - ``triangle.to_incremental()`` turns a cumulative triangle into an incremental triangle, or returns a no-op if the triangle is already incremental. ``triangle.to_cumulative()`` provides the opposite functionality. - ``triangle.add_statics()`` adds static field values from one triangle to the current triangle. Similar functionality might be achieved with a left-join ``merge`` operation or even ``derive_fields``, but ``add_statics`` offers greater control over merging single cell fields into the base triangle. - ``triangle.make_right_triangle()`` creates a lower-diagonal of the existing triangle with empty cell field values. - ``triangle.make_right_diagonal()`` creates a new triangle diagonal for user-specified evaluation dates. Plots ------- The ``Triangle`` class currently has a couple of useful visualizations using Plotly, but better visualization functionality will be added in the future. ``triangle.plot_data_completeness()`` shows the triangular data structure as a scatter plot in ``(experience_period, development_lag)`` coordinate space. Each point represents a cell, colored proportional to the proportion of cell field values that are present in the cell. If all cells have the same number of cell fields, they will all be the same color. ``triangle.plot_right_edge()`` plots the most recent ('right edge') for, by default, paid and/or reported loss ratios (using earned premium), but users can pass their own functions of cell values.