How To Predict Employee Turnover Using SAP InfiniteInsight
Many HR departments are looking at predictive analytics as a hot new approach to improve their decision making and offer exciting new services to their business. Luckily, with SAP InfiniteInsight you don’t have to be a Data Scientist to find the valuable insights hidden in your data or build powerful predictive models. Combined with this, SuccessFactors Workforce Analytics provides clean, validated information bringing together disparate data from multiple systems into one place to enable decision making. Let’s see on a concrete example how you could use this combination to better understand your workforce and make predictions in areas that really matter to your business.
Meet John – he’s an HR analyst working for a large insurance company and responsible for supporting line of business managers with workforce insights. He’s been monitoring a concerning trend over the last year regarding the turnover of sales managers in the company’s regional offices – his turnover reports in Workforce Analytics have shown significant deviations from the tool’s industry benchmarks. Today, he has a call with Amelia, the global head of sales, to talk about headcount planning. John takes the opportunity to inform Amelia about his findings only to learn that Amelia has been made aware of this phenomenon a few weeks ago by a few of her direct reports: “You know, John – I’m fine with people leaving, a bit of turnover is healthy and keeps our business competitive but what I’ve been hearing is that we tend to lose the wrong people, namely mid-level sales managers with a great performance record. If an experienced sales employee leaves we take an immediate hit to our numbers so we naturally try very hard to keep them. Our salary is more than competitive and we offer great benefits so I have trouble imagining what could be the drivers behind this trend. Can you please investigate and let me know what I could do to reverse this development?”
John discusses his suspicions with some of the other analysts who have observed similar trends in other lines of business. Some of his colleagues hint that a lack of promotion or a general increase in the readiness to change jobs might have an influence on employees’ propensity to leave. So John decides to extend his analysis beyond sales and include other business functions as well. He prepares a dataset with all the employees in his company as of the end of his company’s last fiscal year (09/2013) and flags employees who have left the company voluntarily within the following 12 months (until 09/2014) to have a basis for his analysis. The dataset also contains a range of variable to assess their influence on turnover such as previous roles, demographics or performance. The 12 months period for tracking the employee will allow John to anticipate an employee at risk with sufficient lead time to give a manager the opportunity to react if required. Even though John has already some rough hypothesis what could drive turnover based on his reports in Workforce Analytics, he wants to keep the analysis broad to capture unexpected relationships as well.
John starts up SAP InfiniteInsight and decides to build a classification model to classify the employees in his dataset into those who would leave within the next 12 months and those who would still be with the company.
John connects to the SuccessFactors Workforce Analytics database and selects his dataset as a data source:
He clicks “Next” and instructs SAP InfiniteInsight to analyze the structure of his dataset by clicking on the “Analyze” button next.
John is happy with the suggest structure of the dataset – SAP InfiniteInsight has recognized all the fields in his dataset correctly and John doesn’t need to make any changes. He clicks “Next” to progress to the model definition screen:
John can use all the variables in his dataset except for the Employee ID since this field is perfectly correlated with the outcome John likes to model. Therefore he excludes Employee ID from the model definition. As target variable John uses the “Will leave within 12 months” flag from his dataset. This flag contains “Yes” for all employees who leave within 12 months and “No” for those who are still with the company. The analyst clicks “Next” to review the definition before executing the model generation:
Since John is no Data Scientist and doesn’t want to deal with manual optimization of the models, he uses SAP InfiniteInsight’s “Auto-selection” feature: When “Enable Auto-selection” is switched on (by default), SAP InfiniteInsight will generate multiple models with different combinations of the explanatory variables that John has selected in the previous screen. This way the tool optimizes the resulting model in regards to predictive power and model robustness (i.e. generalizability to unknown data). Simply put: When using this feature John will get the best model without having to deal with the details of the statistical estimation process. He now clicks “Generate” to start the model estimation process.
Eight seconds later, SAP InfiniteInsight presents John the results of the model training:
John reviews the results: His dataset had 19,115 records and 22 dimensions were selected for analysis. 9.02% of all employees inside the historical dataset (snapshot of 09/2013) left the company voluntarily between 10/2013 and 09/2014, i.e. within 12 months of the snapshot (=his target population), while 90.98% of employees were still employed. These descriptive results are in line with his turnover reports from Workforce Analytics.
John now looks at the model performance (highlighted in red) and sees that the best model that SAP InfiniteInsight has chosen has very good Predictive Power (KI = 0.8368 , on a scale from 0 to 1 with 1 being a perfect model) as well as extremely high robustness (Prediction Confidence: KR = 0.9870, on a scale from 0 to 1). Also, from the 22 variables John had originally selected, the best model only needs 16 variables: The remaining six variables didn’t offer enough value and have therefore been automatically discarded. Based on the model’s KI and KR values John concludes that not only does the model perform very well on his dataset – it also can be applied to new data without losing its predictive power. He is very happy with the results and clicks “Next” to progress to the detailed model debriefing.
John decides to look at the model’s gain chart to understand how much value his model offers for classifying flight risk employees compared to picking employees at random (i.e. not using any model at all). So he selects “Model Graphs”…
The graph compares the effectiveness of John’s model (blue line) at identifying flight risk employees with picking employees at random (red line) as well as having perfect knowledge of who would be leaving (green line). Since the model’s gain (blue line) is very close to the perfect model (green line) John concludes that there is probably only very little that could be done to further improve the model since it is already very close to perfection (for more information on how to read gain charts see here). The analyst decides it’s worth looking at the individual model components to understand which variables drive employee turnover. He clicks on “Previous” and selects “Contribution by Variables” on the “Using the Model” screen.
John looks at the chart and can see that the top three variables contributing to voluntary turnover are “JobLevelChangeType”, “Current Functional Area” and “Change in Performance Rating”. He decides to look at them in more detail by double-clicking on the bar representing each variable.
The most important variable is “JobLevelChangeType” which describes how an employee got into his or her current position: The higher the bar, the greater the likelihood to leave within the next 12 months. John sees directly that being an external hire or having been demoted contributes significantly to turnover. He isn’t surprised to see “demotion” as a strong driver since his company had only three years before begun using this approach to make the organization more permeable in both directions and this has seen some resistance by employees. Based on the data, it seems that having been demoted drastically reduced employee retention.
Also, external hires seem to rather leave the company as opposed to looking at better opportunities within the company and John makes a note about this – he wants to discuss this with Amelia since he currently doesn’t see why external hires would behave this way.
Next, John looks at “Current Functional Area”:
John immediately sees his suspicions confirmed: Working in sales contributed significantly to employee turnover – and this by a wide margin! He continues to the third variable “Change in Performance Rating”:
The pattern John had observed in the first two variables continues – seeing one’s performance level decrease drove employees away while improving oneself helped the company retain employees. The company has introduced a stack ranking system where performance levels were always evaluated in relation to an employee’s peers to encourage grow and competition – especially in the sales department. However, as a consequence many employees see their performance decrease (12.8% of employees have experienced this during the period) while there may not necessarily be something wrong with an employee’s absolute performance: A previously high performing employee may see his or her performance rating decrease while delivering the same results simply because he/she is part of a high performing team where some of the other team members had a better year. The results of the model hint at an unintended side-effect of this system – instead of putting up with decreasing performance ratings and training harder, the company’s employees tend to quit their jobs and try their luck elsewhere. John finds this interesting and plans to discuss this with Amelia to understand whether these effects were welcome in her department.
John looks at the remaining 13 variables to understand the other drivers better. He observes a strong influence of tenure on turnover levels (especially among mid-level employees with tenure between 5 and 9 years) or not having had a promotion within the last three years. There also seem to be differences across countries, regions and demographic variables such as age or gender. The patterns that John sees in the model paint the picture that the company has indeed a problem keeping experienced employees, especially in the sales department – and the culprit seems to be new stack ranking performance evaluation scheme John’s company had implemented three years ago in an attempt to foster a more competitive and performance oriented company culture. This is supported by the data from the countries – those few countries where the stack ranking system hadn’t been implemented yet have significantly lower turnover. The story that emerges is one of an experienced, well-performing employee who is confronted with the new performance evaluation scheme, sees his or her performance ratings drop with pressures on the rise and then decides to leave.
John assembles the information into a presentation for his HR top management to address the topic. After having had a follow-up discussion with Amelia who confirmed his conclusions, he is convinced that the stack ranking system is not tuned to the volatile sales business and serves as a driver of turnover. In preparation of the meeting John decides to apply his model on current data to identify those employees from the sales department who are currently at risk of leaving.
John refreshes his dataset based on the most current data. Using the model’s confusion matrix John chooses a high sensitivity level to predict potential leavers. The confusion matrix compares the model’s performance in classifying employees into leavers and non-leavers (=”predicted yes” / “predicted no”) against the actual, historical data (=”true yes” / “true no”). This way John can understand how well the model performs at classifying individual employees into leavers and non-leavers – every model makes mistakes but good models make fewer mistakes than bad models and the confusion matrix tells John which categories the model confuses with one another compared to the actual outcomes (hence the name “confusion matrix” – more info here).
Using this model on the list of sales reps should give John a list of employees of which statistically 56.72% (the model’s sensitivity score) would actually leave the company within the next 12 months. John applies the model on his new dataset:
After applying the model, John looks at the resulting list: Out of 2,120 employees, his model has identified 473 employees at risk out of which he knows about 57% will actually leave within the next year (although he doesn’t know who exactly will be leaving). Since some of these employees perform better than others and are therefore more important to be retained, John filters the list of flight risk employees to only include experienced, well performing sales reps and ends up with a shortlist of 215 employees. From these employees’ sales data in Workforce Analytics he calculates that losing 57% of then could cost the company up to $60M in lost sales. Also, at estimated recruiting and training costs of a new sales manager of 150,000$+ this analysis could save the company up to 215 x 57% x $150,000 + $60M in lost sales = $78.3M.
John discusses the list of 215 employees with Amelia and they decide to go to the HR Leadership Team meeting together to address the urgency of finding appropriate measures to retain these employees. Amelia and the HR Leadership Team are very impressed with John’s work and, faced with the huge impact of doing nothing, decide to free up some budget for appropriate retention measures while at the same time initiating a discussion whether to get rid of the stack ranking evaluation system to reverse the trend…
…and how are YOUR employees?
Employee retention is an important topic with a big impact on a company’s bottom line. Seeing how simple it is to use SAP InfiniteInsight maybe you’d like to try out a similar analysis yourself? A trial version of SAP InfiniteInsight is available here:
Have any other great ideas around using predictive with HR data? Feel free to post your ideas or questions in the comments!