Predicting online consumer lending’s – With Native Spark Modeling and SAP HANA Smart Data Streaming
Today start-ups have jumped over big banks to offer smart, short and quick consumer loans. This boom is called “alternative lending” and it is being majorly driven by proliferation of data sources and the emergence of big data analytic tools.
These alternative lending firms would charge lower interest rates than others to quickly gain a significant share of the market. As the side effect of having many buyers, they run into the situation of having many bad loans – loans where debt is not collectible or it would not generate income for creditors.
Most of these successful alternative lending companies talk about algorithmic, big-data approach to underwriting loans. The crux of the algorithm is less about the individual pieces of data — your postcode, the color of your car, employment status — but how these pieces of information relate to one another. Building a big data machine learning underwriting model is hard, and it is also expensive.
In this blog, you will see how SAP BusinessObjects Predictive Analytics ease and speeds up the complex business scenario like “alternative lending” when dealing with Big Data.
Native Spark Modeling:
Starting release 2.5 of SAP BusinessObjects Predictive Analytics, automated analytics classification model processing has been delegated to Spark for Big Data. With release 3.0, regression models (continuous targets in the model) also are being processed on spark. This release also includes optimizations on SPARK code and now there is 20% further performance improvements in response time as compare to what was seen in 2.5 release. This means no data transfer, faster speed and great scalable environment for your big data needs.
Also Analysts could very easily build their predictive models on top of Big Data as they can rely on trust-able machine learning algorithms, focus more on business questions and less on technological complexities.
Real time scoring with Smart Data Streaming:
With release 3.0 of SAP BusinessObject Predictive Analytics, there is support for streaming via Export of CCL (Continuous Computation Language) to score data in SAP HANA real time on continuous events. The scoring equations are applied on the streams managed via HANA’s Smart Data Streaming capability.
Use-case: Understand lending behaviors and take right decisions in real time
- In the first phase of the “alternative lending” scenario- we would take an example of a credit analyst “Alex” who would like to examine and understand the factors contributing to good and bad loans. He could understand which funding amount is in a safe zone and which for example is in danger zone and WHY.
- In the second phase of “alternative lending” scenario – consider an online lending platform, where hundreds to thousands of requests keep coming around the same time from different places, devices or channels. And then lenders will comfortably rely on automated decision making system that would provide answers immediately to their buyers.
Let’s see use-case steps in detail and how various steps are fulfilled:
- Alex – Credit Analyst at “W-LENDERS” organization grabs and explores data from online Data platform for last 4 years. (Check the sample dataset here). He can add many more other attributes from various public websites and social networks that will help in understanding lending better. The data is present on Hadoop system and in order to perform various operations for exploration, he needs SQL queries so he uses Hive on Hadoop for this dataset. Example in figure: In Automated Analytics, the data is accessed via Hive table “loans_funding” and graph shows funded loan amount(Y-axis) vs. yearly income of buyer(X-axis).
- Next Alex wants wants to understand the factors contributing to lending/funding amount for existing loans. He does this by running a Regression model for the historic loans fund data in Hadoop. The model is being executed on Spark with continuous target ‘funded amount’. And here is the output report after model generation on Spark indicating the quality of model.
- Using Automated Analytics ‘Contribution by Variables’ Alex observes the output with key influencers for his target ‘funded amount’. He understands that ‘Sub Grade’, ‘Grade’, Annual Income’, ‘Interest rate’ for example are highly impacting the target. 6666
- Even more interesting he analyses the grading scale breakdown which has high relevance to target funded amount (the grade A being the less risky-low interest loans and G being high risk-high interest loans depending on borrower quality). The large number of borrowers who are assigned to A grade seem to have negative influence on the target and on the other hand F&G grades seem to be paid well with highest and positive influence on target mean.
- And then the profit curve shows that profit can be maximum for the borrowers with annual income range 47000-56000$ although the population for this range group is not too high in approved loans.
- Looking at quality and output of model, Alex is quite satisfied. He is also happy on how fast he could get through this analysis; the benefits of model processing on Spark enables him to make conclusions faster now. At the same time the output variables (22 variables) are smaller as compare to what was available before training(112 variables) ; to score new dataset, he now needs much less of information than before.
3. It’s time for action now! Alex would like to apply this model on upcoming requests for loans approvals.
- For the purpose of receiving incoming continuous loan requests via various channels (website/mobile/direct…) and to process large volumes of data; Alex’s IT team is making use of HANA Smart Data Streaming. Using Automated Analytics – ‘Generate Code’ feature: Alex hence exports the code(CCL) to work with HANA Smart Data Streaming against the model that he trained on Spark and hands it over to IT team so that they can integrate it in streaming project within HANA.
- [Note that according to the desired target action: Alex has various possibilities to get output scores with SAP BusinessObjects Predictive Analytics:
- He can perform simulation in Automated Analytics
- Apply the model in Hadoop directly if his target data was residing on Hadoop
- Can use Predictive Factory to manage scheduling of model lifecycle to retrain(if input data changed) or to apply(on target system) on required intervals
- And Generate Code for Streaming when using streaming environment as explained in the steps above]
4. Once the project is ready and running in SAP HANA Streaming environment, Alex gives a try with a new upcoming loan approval request. A new buyer is applying for marriage loan for 3 year term with low average income, Alex puts down all his details as known on the event page and run the streaming project. He gets an output as the maximum loan amount(rr_funded_amnt) that could be funded for this buyer based on the learning that Alex did on Hadoop-Spark.
5. His IT team will now bring this project to live to get output scores for all incoming requests going forward i.e. they will APPLY the scoring on input streams and receive an output stream with response score for funding that will be available within seconds.
Alex is happy as not only he could automate entire lifecycle of loan funding process very quickly but also he has high degree of confidence on the tool that will respond to borrowers with decisions.