Insurance Claims triangle – A jab at SQLScripting
The Insurance Claims triangle/Loss triangle/Run off triangle
Before we delve into the prediction, scripting and all the interesting stuff, lets understand what the claims loss triangle really is. An insurance claims triangle is a way of reporting claims as they developer over a period of time. It is quite typical that claims get registered in a particular year and the payments are paid out over several years. So it becomes important to know how claims are distributed and paid out. An Insurance claims triangle does just that. Those who are familiar with Solvency II norms set by EIOPA would be familiar with the claims triangle report. The report is a mandate and is in the Quantitative Reporting Templates(QRTs).
Fig : 1 – The claims triangle
In figure – 1, the rows signify the year of claim registration and the columns the development year. Consider that we are in the year 2013 and are looking at the claims triangle. The first row focuses on the claims registered in the year 2005. The first column of the first row(header 0) gives you the claim amount paid out by the insurance company in the same year. The second column gives you claim amount paid out in the next year(2006). This goes on until the previous year of reporting i.e. 2012. The second row does the same thing, but for the claims registered in the year 2006. Logically, as each row is incremented, the number of columns would be lesser by one. This gives the triangular shape to the report and hence its catchy name. The claims triangle could be of two types – incremental or cumulative. Incremental is when each column hold the amount paid at that specific intersection of registration year and payment year. The cumulative on the hand would contain the cumulative claims paid out as of that intersection point.
The below prediction is based on the cumulative model of the claims triangle. We would base or logic on a set of records stored at the cumulative level. I have uploaded the input data as a CSV in the blog for saving your time.
The interesting part is to fill the second triangle of the rectangle(if you will). Typically R is used to do this work and that would be a much easier and reliable way to do it of-course. If you are interested to follow the R way, I would suggested viewing these videos presented in the channel SAP Academy – https://youtu.be/wogBQ8Rixwc . It was out of shear curiosity that I planned on implementing an SQL Script based implementation of the loss triangle. Let’s try understanding the algorithm first.
As an insurance company it would be useful to know what you would have to pay out as claims in the years to come. It helps the insurance company to maintain financial reserves for future liabilities and reduce risk of solvency. There are quite some statistical models used to predict the future numbers, but the most accepted one is the Chain Ladder algorithm presented by T.Mack.
Well, lets see the math behind the prediction. I’d have to candidly accept that my math is not too refined. So I would rather explain it in words. The algorithm itself has two parts to it – building the CLM estimator and the prediction itself.
Phase 1 : Derivation of the CLM(Chain ladder method) estimator
The first phase would be to determine the multiplication factors for each column which would later be used for the prediction.
Fig : 2 – CLM Estimator derivation
The above figure shows the CLM estimator of each column. Basically the math is a rather simple division of subsequent columns with equal number of cells. The CLM estimator for column 3 is derived as the division of the cumulative values of column number 3 over column number 2 excluding the last cell of column number 2. The same exercise is repeated over all adjacent sets of columns to build the estimators.
Phase 2 : Predicting the values
The prediction is a recursive exercise that is done one diagonal row at a time. Each diagonal row signifies claim payments for one particular future year. Looking again at figure 1, the first empty diagonal row would hold the predicted values that would be paid out in the year 2013 for the claims registered across different years. The next diagonal row would be for 2014 and so on.
Fig : 3 – Prediction
Each predicted value is to be calculated as a product of the CLM estimator of the target column and the amount in the predecessor column of the same row. Once an entire row is calculated, the next diagonal row is calculated the saw but based on the previous predicted diagonal row. The whole process is done until the entire rectangle is complete.
The SQL Scripting
Now to get to the meat of this blog. I took a major assumption in the example that I show here; I assume the cumulative values for the run-off triangle is available in a table. The reason is that data for claims and payments could be on a single table or multiple tables depending on how the insurance data model is implemented. An SQL/View would have to be written to build a cumulative value and the the whole SQL script done here can be pointed to it. For simplicity I just use a single table here.
The whole implementation is on a script based calculation view.
Fig : 4 – Calculation view semantics
As you see above, the calculation view gives out 5 fields
- Claim_year – Year of claim registration
- Pymt_year – Year of payment(cumulative)
- Dev_year – Claim development year
- Predict – A flag to distinguish predicted and historical values
- Amount – Cumulative amount
Script -> Variable declarations
Fig : 5 – Variable declaration
Above is just a bunch of variables that would be used in the calculations below. I use an array of real type to store the CLM estimators.
Script -> Variable definitions
Fig : 6 – Variable definition
What you see above is building three variables – the minimum year, maximum year for calculation and their difference. The next component is building a table t_claim_table based on pre-calculated cumulative claim amounts stored in the CLAIMS_PAID table. The above part of the code could be modified based on the underlying data model and calculation requirements. For example if you are trying to execute claims triangle as of current status, the max value could be selected as select year(current_date) from dummy and the min could be filled from an input parameter or from the table itself as done here. For simplicity of my simulation, I have hard-coded the max and obtained the min from the table itself. The select query on CLAIM_PAID also could be changed based on the data model used. Assuming we were able to get over the above hurdle of building the input data.
Script -> Building the CLM estimator
Fig : 7 – CLM Estimator
To understand the math behind the CLM estimator I recommend reading the topic on the “The Prediction” above. I use a while loop to iteratively go over subsequent columns, build the sum and in the outer query divide and arrive at the CLM estimator. The value is then saved into an array. The iteration starts from 0 to the maximum number of years for which the run of triangle goes. For our example, looking at figure 1, this would be 2012 – 2005 = 7. So we could safely assume the while loop runs 7 times to calculate the 7 CLM estimator values as seen in figure 2. The variable ‘i’ helps in controlling selection of the correct column. At the end of the while loop, all the 7 CLM estimator values would be in the array.
Script -> Predicting the values
Fig : 8 – The prediction
To understand math behind the prediction done here, I recommend reading the topic on the “The Prediction” above. There are two nested for loops that do the work. The inner for loop calculates each cell within one diagonal row at a time. The outer for loop runs as many times as there are diagonal rows until the rectangle is filled. The three variables ‘i’, ‘j’ and ‘h’ control calculation of each value. The CLM estimator is obtained from the array filled in the previous step. I used a UNION to append records to the existing historical claims. This way, once a diagonal row has been predicted, I can use those values to build the next diagonal row. At the end of the loops, the table variable – t_claim_table would have the historic as well as the predicted values filling up the rectangle.
Script -> Finally the output
Fig : 9 – Output
The var_out variable is finally filled to be displayed as output. The case statement checks whether it is a predicted or a historic value and is later used for applying a filter in the report.
Visualization – SAP Lumira Reports
Putting all the moving pieces together, Lumira is the perfect tool to show the output. I used a cross-tab report to demonstrate the triangular layout. The development year is along the columns and the claim registration year is along the rows. Additionally a filter lets you make the report even more interactive.
Fig : 10 – SAP Lumira report showing loss triangle with only historical values
Fig : 11 – SAP Lumira report showing loss triangle with the predicted values
I am quite keen on listening to your feedback and suggestions on if there is a better way to script this (Of course not using the shortcut by calling R)