I was recently doing some deep research when I was reminder about Benford’s Law. Well I was actually watching episode 4 “Digits” of the series Connected: The Hidden Science of Everything that explores the probability of the first digit for any random series of numbers. Be it social media, tax return, music score, distance to stars, etc.
If, like me, you didn’t really pay attention during your Statistics and Computing for Management 101 course, then here’s a very short reminder of what it is. Courtesy of Wikipedia of course since my course notes on this topic are long lost:
“Benford’s Law, also called the Newcomb–Benford Law, the law of anomalous numbers, or the first-digit law, is an observation about the frequency distribution of leading digits in many real-life sets of numerical data. The law states that in many naturally occurring collections of numbers, the leading digit is likely to be small. In sets that obey the law, the number 1 appears as the leading significant digit about 30% of the time, while 9 appears as the leading significant digit less than 5% of the time. If the digits were distributed uniformly, they would each occur about 11.1% of the time. Benford’s Law also makes predictions about the distribution of second digits, third digits, digit combinations, and so on.”
In short, in numerical series that follow Benford’s Law, the probability of the number’s first digit is as follows:
Or, if like me you prefer it in graphical format:
Of course, this is a simplified version, but I am sure you’ll get the logic.
Now, where it becomes interesting is that this pattern seems to apply regardless of the data set, even if it is more accurate as the size of the set increases. Benford had tested this model on 20 sets of data ranging from various topics such as surface areas of rivers, population, street addresses, molecular weights, etc. and it coincided every time.
So I started thinking: could we apply this to Governance, Risk, and Compliance and if so, to what benefits?
Here, I am already way behind. Fraud Investigators have already been using this approach to identity potential anomalies when numbers reported deviated from what would be expected following Benford’s Law. And quite successfully in many cases so I will defer here to experts from the Association of Certified Fraud Examiners (ACFE) who have already published many articles and presentations on this topic.
If you are a Fraud Investigator, this is definitely something worth considering when looking for the needle in the haystack!
Most organizations keep a record of loss events, or incidents that could be associated to a financial exposure. And in some cases, they may want to understand whether the reporting is accurate and that no sets of incidents have been “omitted” or tampered with. Not necessarily to defraud the company for personal financial gain, but sometimes simply to show one’s business or department in a better light than it really is.
Acting as would an Auditor, a Compliance Officer or even a Regulator, I decided to apply Benford’s Law to a set of 21,066 anonymized loss events that are typically used in Operational Risk Management for Financial Institutions:
For this sizeable data set, the values follow Benford’s predictions quite closely. Acting as one of the profiles above, I would most certainly decide to review some of the losses reported by the 1st Line where the financial exposure has been assessed as starting with a “2”. Maybe just a coincidence, but this is definitely a signal worth investigating further.
Talking about the regulator and financial institutions, could fines imposed to organizations in this industry also follow this distribution? To find out, I decide to test this on the Top Bank Fines in 2020 as per the report from Finbold.
The data set itself is quite small (48 fines listed in this report) and, as mentioned earlier, the wider the numerical series, the best Benford’s Law will fit. Regardless, it’s clear that the pattern is still being mostly followed here as well.
Another area where GRC experts – but also Board members, often question the reliability of information relates to risk management.
I stopped counting the number of times when I hear that Business Managers “reviewed” the values of the risk assessments since there were too many high exposures and that it made them look bad…
Here, I decided to apply this approach on the risk demo data that we, at SAP, have in our demo environment.
The issue here is that there are many facets that risk information can be looked at and, more importantly, that most risk information is actually a combination of multiple components which biases the results.
I therefore decided to focus on 2 criteria:
* Inherent (Gross) total loss: where the user directly enters manually the sum of the total potential exposure for a given risk and
* Residual (Net) total loss: which takes into account any action plans, controls, policies, etc. associated to the risk to mitigate it.
It is very obvious that the data set here doesn’t really follow Benford’s Law to a T. But is this really a surprise? I don’t think so and there are at least 2 explanations to this:
- The residual values are a combination of inherent assessment and completeness but also effectiveness of multiple mitigation strategies. We are therefore including a bias coming from both the end-user – when defining the mitigation, and from the tool – when calculating the residual values;
- More importantly, this is demo data. And I know for a fact that we reworked it once created in the software to make sure that it was spread all over the risk matrix and looked nice for demo purposes. As a result, we tampered with the data and this is precisely what the graph shows: it is not reliable data!
Based on this finding but also more generally, I think this exercise could be applied with a wider purpose in mind: ensuring data quality and reliability.
We often ask about accuracy, completeness, consistency and reliability of the information being used to steer the business and the answer is not always easy to find. Yes, there are audit trails, yes there are locks to prevent data from being modified but sometimes data needs to be copied to a data lake, sometimes it needs to be enriched or consolidated with other information, etc. and being able to ensure data quality then becomes hazardous.
Could Benford’s Law help here in at least providing a first indication on the credibility of the information being presented? I personally believe that it’s certainly worth including as a first “sniff test” before using the information in the decision making or business planning process!
What about you, are there other areas where you think GRC experts could apply Benford’s Law? I look forward to reading your thoughts and comments either on this blog or on Twitter @TFrenehard