Skip to Content

How To Predict Employee Turnover Using SAP InfiniteInsight

Many HR departments are looking at predictive analytics as a hot new approach to improve their decision making and offer exciting new services to their business. Luckily, with SAP InfiniteInsight you don’t have to be a Data Scientist to find the valuable insights hidden in your data or build powerful predictive models. Combined with this, SuccessFactors Workforce Analytics provides clean, validated information bringing together disparate data from multiple systems into one place to enable decision making. Let’s see on a concrete example how you could use this combination to better understand your workforce and make predictions in areas that really matter to your business.

The Scenario

Meet John – he’s an HR analyst working for a large insurance company and responsible for supporting line of business managers with workforce insights. He’s been monitoring a concerning trend over the last year regarding the turnover of sales managers in the company’s regional offices – his turnover reports in Workforce Analytics have shown significant deviations from the tool’s industry benchmarks. Today, he has a call with Amelia, the global head of sales, to talk about headcount planning. John takes the opportunity to inform Amelia about his findings only to learn that Amelia has been made aware of this phenomenon a few weeks ago by a few of her direct reports: “You know, John – I’m fine with people leaving, a bit of turnover is healthy and keeps our business competitive but what I’ve been hearing is that we tend to lose the wrong people, namely mid-level sales managers with a great performance record. If an experienced sales employee leaves we take an immediate hit to our numbers so we naturally try very hard to keep them. Our salary is more than competitive and we offer great benefits so I have trouble imagining what could be the drivers behind this trend. Can you please investigate and let me know what I could do to reverse this development?”

The Data

John discusses his suspicions with some of the other analysts who have observed similar trends in other lines of business. Some of his colleagues hint that a lack of promotion or a general increase in the readiness to change jobs might have an influence on employees’ propensity to leave. So John decides to extend his analysis beyond sales and include other business functions as well. He prepares a dataset with all the employees in his company as of the end of his company’s last fiscal year (09/2013) and flags employees who have left the company voluntarily within the following 12 months (until 09/2014) to have a basis for his analysis. The dataset also contains a range of variable to assess their influence on turnover such as previous roles, demographics or performance. The 12 months period for tracking the employee will allow John to anticipate an employee at risk with sufficient lead time to give a manager the opportunity to react if required. Even though John has already some rough hypothesis what could drive turnover based on his reports in Workforce Analytics, he wants to keep the analysis broad to capture unexpected relationships as well.

The Analysis

John starts up SAP InfiniteInsight and decides to build a classification model to classify the employees in his dataset into those who would leave within the next 12 months and those who would still be with the company.


John connects to the SuccessFactors Workforce Analytics database and selects his dataset as a data source:

02-Select_Dataset - WFA.png

He clicks “Next” and instructs SAP InfiniteInsight to analyze the structure of his dataset by clicking on the “Analyze” button next.


John is happy with the suggest structure of the dataset – SAP InfiniteInsight has recognized all the fields in his dataset correctly and John doesn’t need to make any changes. He clicks “Next” to progress to the model definition screen:


John can use all the variables in his dataset except for the Employee ID since this field is perfectly correlated with the outcome John likes to model. Therefore he excludes Employee ID from the model definition. As target variable John uses the “Will leave within 12 months” flag from his dataset. This flag contains “Yes” for all employees who leave within 12 months and “No” for those who are still with the company. The analyst clicks “Next” to review the definition before executing the model generation:


Since John is no Data Scientist and doesn’t want to deal with manual optimization of the models, he uses SAP InfiniteInsight’s “Auto-selection” feature: When “Enable Auto-selection” is switched on (by default), SAP InfiniteInsight will generate multiple models with different combinations of the explanatory variables that John has selected in the previous screen. This way the tool optimizes the resulting model in regards to predictive power and model robustness (i.e. generalizability to unknown data). Simply put: When using this feature John will get the best model without having to deal with the details of the statistical estimation process. He now clicks “Generate” to start the model estimation process.

The Results

Eight seconds later, SAP InfiniteInsight presents John the results of the model training:


John reviews the results: His dataset had 19,115 records and 22 dimensions were selected for analysis. 9.02% of all employees inside the historical dataset (snapshot of 09/2013) left the company voluntarily between 10/2013 and 09/2014, i.e. within 12 months of the snapshot (=his target population), while 90.98% of employees were still employed. These descriptive results are in line with his turnover reports from Workforce Analytics.

John now looks at the model performance (highlighted in red) and sees that the best model that SAP InfiniteInsight has chosen has very good Predictive Power (KI = 0.8368 , on a scale from 0 to 1 with 1 being a perfect model) as well as extremely high robustness (Prediction Confidence: KR = 0.9870, on a scale from 0 to 1). Also, from the 22 variables John had originally selected, the best model only needs 16 variables: The remaining six variables didn’t offer enough value and have therefore been automatically discarded. Based on the model’s KI and KR values John concludes that not only does the model perform very well on his dataset – it also can be applied to new data without losing its predictive power. He is very happy with the results and clicks “Next” to progress to the detailed model debriefing.


John decides to look at the model’s gain chart to understand how much value his model offers for classifying flight risk employees compared to picking employees at random (i.e. not using any model at all). So he selects “Model Graphs”…


The graph compares the effectiveness of John’s model (blue line) at identifying flight risk employees with picking employees at random (red line) as well as having perfect knowledge of who would be leaving (green line). Since the model’s gain (blue line) is very close to the perfect model (green line) John concludes that there is probably only very little that could be done to further improve the model since it is already very close to perfection (for more information on how to read gain charts see here). The analyst decides it’s worth looking at the individual model components to understand which variables drive employee turnover. He clicks on “Previous” and selects “Contribution by Variables” on the “Using the Model” screen.09-Variable_Contributions.png

John looks at the chart and can see that the top three variables contributing to voluntary turnover are “JobLevelChangeType”, “Current Functional Area” and “Change in Performance Rating”. He decides to look at them in more detail by double-clicking on the bar representing each variable.


The most important variable is “JobLevelChangeType” which describes how an employee got into his or her current position: The higher the bar, the greater the likelihood to leave within the next 12 months. John sees directly that being an external hire or having been demoted contributes significantly to turnover. He isn’t surprised to see “demotion” as a strong driver since his company had only three years before begun using this approach to make the organization more permeable in both directions and this has seen some resistance by employees. Based on the data, it seems that having been demoted drastically reduced employee retention.

Also, external hires seem to rather leave the company as opposed to looking at better opportunities within the company and John makes a note about this – he wants to discuss this with Amelia since he currently doesn’t see why external hires would behave this way.

Next, John looks at “Current Functional Area”:


John immediately sees his suspicions confirmed: Working in sales contributed significantly to employee turnover – and this by a wide margin! He continues to the third variable “Change in Performance Rating”:


The pattern John had observed in the first two variables continues – seeing one’s performance level decrease drove employees away while improving oneself helped the company retain employees. The company has introduced a stack ranking system where performance levels were always evaluated in relation to an employee’s peers to encourage grow and competition – especially in the sales department. However, as a consequence many employees see their performance decrease (12.8% of employees have experienced this during the period) while there may not necessarily be something wrong with an employee’s absolute performance: A previously high performing employee may see his or her performance rating decrease while delivering the same results simply because he/she is part of a high performing team where some of the other team members had a better year. The results of the model hint at an unintended side-effect of this system – instead of putting up with decreasing performance ratings and training harder, the company’s employees tend to quit their jobs and try their luck elsewhere. John finds this interesting and plans to discuss this with Amelia to understand whether these effects were welcome in her department.

John looks at the remaining 13 variables to understand the other drivers better. He observes a strong influence of tenure on turnover levels (especially among mid-level employees with tenure between 5 and 9 years) or not having had a promotion within the last three years. There also seem to be differences across countries, regions and demographic variables such as age or gender. The patterns that John sees in the model paint the picture that the company has indeed a problem keeping experienced employees, especially in the sales department – and the culprit seems to be new stack ranking performance evaluation scheme John’s company had implemented three years ago in an attempt to foster a more competitive and performance oriented company culture. This is supported by the data from the countries – those few countries where the stack ranking system hadn’t been implemented yet have significantly lower turnover. The story that emerges is one of an experienced, well-performing employee who is confronted with the new performance evaluation scheme, sees his or her performance ratings drop with pressures on the rise and then decides to leave.

John assembles the information into a presentation for his HR top management to address the topic. After having had a follow-up discussion with Amelia who confirmed his conclusions, he is convinced that the stack ranking system is not tuned to the volatile sales business and serves as a driver of turnover. In preparation of the meeting John decides to apply his model on current data to identify those employees from the sales department who are currently at risk of leaving.

The Prediction

John refreshes his dataset based on the most current data. Using the model’s confusion matrix John chooses a high sensitivity level to predict potential leavers. The confusion matrix compares the model’s performance in classifying employees into leavers and non-leavers (=”predicted yes” / “predicted no”) against the actual, historical data (=”true yes” / “true no”). This way John can understand how well the model performs at classifying individual employees into leavers and non-leavers – every model makes mistakes but good models make fewer mistakes than bad models and the confusion matrix tells John which categories the model confuses with one another compared to the actual outcomes (hence the name “confusion matrix” – more info here).


Using this model on the list of sales reps should give John a list of employees of which statistically 56.72% (the model’s sensitivity score) would actually leave the company within the next 12 months. John applies the model on his new dataset:


After applying the model, John looks at the resulting list: Out of 2,120 employees, his model has identified 473 employees at risk out of which he knows about 57% will actually leave within the next year (although he doesn’t know who exactly will be leaving). Since some of these employees perform better than others and are therefore more important to be retained, John filters the list of flight risk employees to only include experienced, well performing sales reps and ends up with a shortlist of 215 employees. From these employees’ sales data in Workforce Analytics he calculates that losing 57% of then could cost the company up to $60M in lost sales. Also, at estimated recruiting and training costs of a new sales manager of 150,000$+ this analysis could save the company up to 215 x 57% x $150,000 + $60M in lost sales = $78.3M.

John discusses the list of 215 employees with Amelia and they decide to go to the HR Leadership Team meeting together to address the urgency of finding appropriate measures to retain these employees. Amelia and the HR Leadership Team are very impressed with John’s work and, faced with the huge impact of doing nothing, decide to free up some budget for appropriate retention measures while at the same time initiating a discussion whether to get rid of the stack ranking evaluation system to reverse the trend…

…and how are YOUR employees?

Employee retention is an important topic with a big impact on a company’s bottom line. Seeing how simple it is to use SAP InfiniteInsight maybe you’d like to try out a similar analysis yourself? A trial version of SAP InfiniteInsight is available here:

Have any other great ideas around using predictive with HR data? Feel free to post your ideas or questions in the comments!

You must be Logged on to comment or reply to a post.
  • So ...

    it seems life can be reduced to ... let´s say 25 variables? maybe 26?

    Come on ...,

    do you really like playing the sorcerer´s apprentice?

    This cannot be serious...

    • Hi Pablo,

      Thanks for your comment. Many people would like to believe that humans are different and their actions cannot be predicted but psychologists have used statistical techniques to predict human behavior for more than 100 years. In fact, the study of statistical methods is an integral part of any psychology curriculum because these methods are the primary channel through which human behavior can be understood. And just like Edouard has pointed out correctly, predicting who wants to leave his/her job is fundamentally not much different from predicting who is going to buy a certain product, engage in fraudulent practices or even commit crimes. I can understand the discomfort but this is already happening on a large scale - if you'd like to read up on it, I recommend Eric Siegel's book "The Power to Predict Who Will Click, Buy, Lie or Die"

      In the case of employee turnover, scientific research has shown that variables such as job satisfaction, organizational practices (e.g. performance evaluation schemes) or access to personnel development measures can influence an employee's intention to change jobs (you can do a simple Google scholar search for "employee turnover" to literally get thousands of scientific studies dating back as far as the 1960s).

      I'm not saying that the model presented in this blog holds up to scientific scrutiny but this is also not the intend: This post is supposed to show how the analytical process behind a turnover prediction works based on an example - it shows variables that other organizations are likely to find in their own analyses but for obvious reasons I can't post the results of an actual analysis in a public forum. Please rest assured though - I worked with different companies on this very subject and, just as is supported by science, it is indeed possible to predict whether someone is going to stay in a job or leave it. In the end it is always a probabilistic statement and even when there's a 99% probability for an individual to leave, it is still not certain. But oftentimes one is not interested in making individual predictions but rather identifying risk groups as is explained in the post - and here it doesn't matter who exactly in a group is going to leave because you can be sure that on average the predicted percentage of people will actually leave thus giving you a mathematical edge that can be used to derive all kinds of measures. And in the end - this is what it is all about: Using information to make better business decisions and predictive analytics has been proven to provide significant advantages over traditional analytics for this purpose.

      I hope this information has been helpful to you!

      Best regards,


      • Hi David,

        Just a few comments:

        1.- human actions/behaviour can be predicted

        if this is the case, are you implying that there is no room for the individual?

        2.- human actions/behaviour can be understood using statistical techniques/methods

        What happens then with direct contact with people (straightforward conversation)

        Life is "drama", drama in the sense that we learn or change according to our own experiences, according to what happens to us. We start understanding certain things once we´ve gone through them.

        3.- does this cause discomfort

        No it is not discomfort it´s simply that life cannot be reduced to simple figures as much as you try.

        4.- Using information to make better business decisions

        In my opinion many of the decisions are already made beforehand and the model is used simply to "dress" the act.

        Best regards.

        • Hi Pablo,

          your comments touch on a couple of fundamental issues that need to be considered in general but are out of scope of a simple blog post like mine 😉

          To comment on your remarks:

          1.- human actions/behaviour can be predicted

          if this is the case, are you implying that there is no room for the individual?

          Absolutely - these statistical models account for individual freedom which is why no event (with weird exceptions) is certain. Even though the model might predict a 80% probability of a certain action, a individual might behave completely differently.But on a larger scale these models will capture human tendencies correctly so they serve as orientation of things that are likely to happen - in the latter case 80% of people are going to act according to your prediction. Knowing a tendency does not mean you know what a certain employee is going to do but it allows you to form the expectation that in the long run will perform according to the statistical model.

          2.- human actions/behaviour can be understood using statistical techniques/methods

          What happens then with direct contact with people (straightforward conversation)

          Life is "drama", drama in the sense that we learn or change according to our own experiences, according to what happens to us. We start understanding certain things once we´ve gone through them.

          That's true. Reality changes constantly and relationships that we uncover using statistical can be stable over time but they don't necessarily need to be. This is why we retrain models frequently in order to update them based on any changes in the relationships that happened in the meantime.

          For example:

          A) Your company is doing well and people are generally happy with their jobs so few people leave, mostly because they are offered better positions elsewhere but not because there's something fundamentally wrong. A statistical model that you built for this company will capture this and predict low levels of turnover and use mostly external variables as predictors.

          B) The situation changes completely: Your company suffers a huge loss and management starts to feel huge pressures to reduce costs, cutting back on personnel development, no more travel, woing hours increase, salary raises are postponed, etc. - in short: Suddenly working there is not as fun anymore. The turnover model you built in (A) will not capture the new situation and actually predict lower levels of turnover. Once you retrain the model however, the new reality will be included and your predictions will become more precise as we start to use more internal variables related to the company's culture.

          3.- does this cause discomfort

          No it is not discomfort it´s simply that life cannot be reduced to simple figures as much as you try.

          You cannot reduce it to simple figures completely but that doesn't mean that there are trends, tendency and fundamental relationships between variables that we can capture and leverage to understand what's going on and use this understanding to make (more or less) accurate predictions.

          4.- Using information to make better business decisions

          In my opinion many of the decisions are already made beforehand and the model is used simply to "dress" the act.

          This happens in a lot of organizations but is a classic case of a misue of statistics. However, just because some people do not use these techniques correctly does not mean that they don't work.

          Best regards,


          • David,

            i have to admit that although i still disagree with you in most of the statements made here: i reckon you´re exceptional (and i need no statistics for that).

            Best regards.

  • Hallo David,

    I like this example also because it is the area of my customers = insurance. I owuld like to take this scenario in my "personal" box for reusing it on customer side if yoiu agree.

    Is it possible to get an excerpt form the database in Excel or access to the database ?

    Best Regards