K-Means clustering for Discount Analysis and Customer Retention
One of the most interesting Big Data topics is Predictive Analytics. All companies are on a BI maturity journey, moving from reporting the past, to predicting the future, to optimizing business outcomes. This blog will focus on Business Use Cases that are aided by Predictive Analytics. I’ll cover how we selected which of many Predictive Analytic algorithms, and summarize the current deliverable for the customer.
ProMorphics had the opportunity to work with a major software publisher recently who wanted to improve performance in a few key business areas. They wanted to increase the speed of reporting and analytics to near real-time. They demanded flexibility since there is a strong culture of high performance individuals who demand drill-down and report manipulation functionality in a self-service environment. This self-service analytics requirement was one of their top management 360 feedback results from the prior year, and of particular interest to the VP of HR and the President. Having an analytics team use ‘black box’ solutions that aren’t available to power users was not an option. And like many customers, mobility was required to enable instant information access anytime, anywhere. These requirements are common, let’s call them ‘BI Requirements.’ These ‘BI Requirements’ can be addressed by the combination of a near real-time ETL process combined with a Business User friendly ad-hoc analysis and reporting tools.
In addition, they had two KPI’s which are critical to their business: average discount and customer retention. These seemingly simple KPI’s were difficult to measure given the lack of consistent standards of tracking their data across their mix of businesses: direct to consumer, OEM, and direct. The VP of Sales wanted consistency across his Sales Districts. He wanted timely discount analysis in order to improve deal negotiation strategy. He wanted on-the fly sales strategy adjustments based on market intelligence. He was not satisfied with accurate Operational Reports against prior periods; he wanted to impact the outcomes of future periods. In addition, Customer Retention is critical to the entire organization. They didn’t want to accurately measure customers who left last month, they wanted to prevent customers from leaving. They needed algorithms to run against near real-time data, quickly evaluate past trends, make data driven assumptions about future behavior, and recommend actionable behavior. These two requirements are ideally handled with Predictive Analytics.
The next question was which Predictive Analytics algorithm was the best fit. The SAP HANA Predictive Analytics Library (PAL) has approximately 25 algorithms in several categories, including: Clustering, Classification, Association, Time Series, Regression, and Neural Network. SAP HANA also supports the R library, and it’s rich set of several thousand of algorithms. In our crawl phase, we chose to start with the HANA PAL library since this was first Predictive Analytics project for this customer, and we knew we could achieve significant lift with this library. We needed to segment and classify behavior – in this case the behavior of discounting by sales teams and customer churn. There is a set of Classification and Clustering algorithms in PAL. Based on the client need to clearly visualize segments of sales groups based on discount behavior, we narrowed it down to K-Means, Self-Organizing Maps, and Anomaly Detection algorithms. I’m often asked why we don’t start with more advanced, math-driven algorithms. Remember the organization had strong 360 degree feedback for self -service algorithms. Starting with an impenetrable mathematical formula is counter to this goal. We find it’s better at the start of any PA project to lead with powerful standards. As the client’s BI journey continues, try additional algorithms for better predictive fit. In this case, we found K-Means the best fit predictor. In the next article, I will cover in more detail the process we follow in deciding which analytical or predictive algorithm to use for a given Business Use Case.
The client deliverable for their common ‘BI requirements’ was SAP Data Services and HANA to consolidate a single source of truth, combined with BOBJ Dashboards and Web Intelligence for ad-hoc analysis. The client deliverable for their Predictive Analytics needs of discount analysis and customer retention was HANA’s PAL library, K-Means clustering algorithm running against in-memory, near-real time discount and retention data. The users now see highly visual Voronio cells clearly showing segments of sales teams with discount challenges, and customer segments at risk.
Joe Mulligan, VP of Customer Innovation, ProMorphics