Applying predictive analytics to manage employee turnover
One of the top concerns for people in leadership position is employee turnover. “Increase efforts to retain critical talent” is one of the top five goals of human capital management based on a survey of CEOs and other senior organizational leaders [SuccessFactors Workforce Analytics & Strategic Workforce Planning]. Retaining critical talent requires addressing a range of conceptually simple, but operationally complex questions, such as “Who are among the top 1% high-impact-of-loss employees might leave?”, “What types of employee tend to leave voluntarily?”, “What matters to an employee when it comes to flight risk?” and perhaps most important of all, “What will make an employee stay?”.
This article describes how SAP SuccessFactors is leveraging predictive analytics to help answer questions related to understanding turnover and identifying and managing flight risk. The article is divided into four sections. First, we discuss the data we are using to study turnover. Second, the tools we are leveraging to apply predictive analytics technique to this data. Third, the conceptual approach we are using to investigate turnover and flight risk. And last, some of the technical analyses we have conducted and what we have learned so far. As you will see, predictive analytics is already revealing powerful insights into the factors that predict and cause turnover. But we also have much further to go.
The data we are using: Workforce Analytics
Most of the data used in this application comes from the SAP SuccessFactors Workforce Analytics (WFA) solution. Containing over two thousand workforce metrics for reporting, analytics and workforce planning, WFA is the primary reporting tools for SAP SuccessFactors’ suite of products. WFA metrics are grouped into core workforce and mobility metrics, career and development, compensation planning, performance management, succession management, and talent flow analytics, etc., to serve the analytics needs of the product suite [SAP SuccessFactors Workforce Analytics].
Rather than starting from raw transactional employee data, the predictive application takes advantage of the curated workforce metrics contained in WFA to create flight risk predictive models. The basic predictive model draws historical data from measures and dimensions contained in the Core Workforce and Mobility Metric Pack included in the WFA solution. Additional attributes from other metrics packs will be automatically included as they become available.
The core and mobility metrics packs include dimensions such as demographics (age, gender, disability, ethnicity, etc.), compensation (salary, stock options, etc.), development (key position, performance rating, potential rating, etc.), employment (job category, employee class, employment level, grade, etc.), succession (critical job role, succession rating, successor readiness, etc.), and tenure (grade tenure, organization tenure, position tenure, time in grade, etc.).
The headcount and the termination measures from the Core Workforce and Mobility Metric Pack are used to label the termination target variable. Additional dimensions contained in WFA such as time, location, organization and employment type (e.g., full-time and part-time), and employment status (e.g., active and inactive) are also used to segment employee data to customize the analysis.
The tools we are leveraging: SAP Predictive Analytics
The predictive analytics embedded by this application use the classification and regression machine learning algorithm that is part of the SAP Predictive Analytics solution. Utilizing mathematical techniques like that of logistic regression, random forest, and support vector machine, this algorithm repeatedly dissects continuous values and groups categorical ones in search for an optimal model with respect to the target variable. The solution also tunes hyper parameters to reduce prediction error and increase prediction confidence. For more information on the methods used by SAP we encourage reading the book Mining of Massive Datasets by J. Leskovec, A. Rajaraman, ,and J. Ullman.
Another advantage of using SAP Predictive Analytics comes from a suite of data analysis capabilities, such as missing value handling, outlier detection, and correlation analytics; and, its data mining facilities, such as the use of multiple error detection and correction formula and regularization techniques, all of which are carried out iteratively during the model creating process. The tool has been used in many domains for solving various business applications with promising results [SAP Predictive Analytics Customer Testimonials].
What we are doing: our conceptual approach to modeling employee turnover and flight risk
Employee flight risk is casted as a classification problem, such that workforce dimensions are used as predictor variables for predicting the target variable, which in this case, represents whether an employee is at risk of leaving. Historical data characterized by these variables are used to create a classifier, or a predictive model, by the classification algorithm from the input data. The predictive model is then use to predict the likelihood of the risk of leaving.
For example, the following example depicts the relationship between salary and the likelihood of the risk of leaving, where the predictive model is used as a decision boundary that separates, or classifies whether an employee is at risk of leaving, or not.
The application uses data from a fixed timespan, such as three months, to predict an outcome of the equal timespan in the future. For example, in the three-month model, the application uses the data from March, April and May 2016 for creating a predictive model to predict employee flight risk for Jun, July and August 2016.
How we are doing it: our technical approach to data analysis
The tool used in the application – SAP Predictive Analytics conducts five major steps for transforming input data into a predictive model.
The Build ADS step involves the specification of input data, including their storage type and value type. Prepare Data step handles missing value, outliers, correlation and dividing data into training and test datasets. The heavy-lifting happens in the Build Model step, where the classification and regression algorithm, based on SRM and Ridge Regression, is used to generate predictive model from the training dataset, and to report the results in terms of model formula and model indicators, such as predictive power and prediction confidence. The Apply Model step applies model formula on the test dataset, to assign flight risk score for each employee in the dataset. It also allows calibration to determine the likelihood of flight risk with the confusion matrix and cost matrix. Finally, the Simulation step facilitates what-if analysis using a score function to support informed decision making (For more information, see Industrial Mining of Massive Data Sets).
How we measure performance: predictive power and prediction confidence
The performance of the classification and regression algorithm is summarized by predictive power (aka KXEN information indicator, or KI), and prediction confidence (aka KXEN robustness indicator, or KR).
KI measures the capacity of input variables in explaining target variable (or the proportion of target’s variability); in other words, how many cases of the employee flight risk can the input variables in a model explain. The value ranges from 0 to 1, the higher the better. KR (or the generalization capability) measures a model’s ability to display the same level of performances on new datasets. The value also ranges from 0 to 1, the higher the more robust. A good model should have a KR value of at least 0.95 (For a more comprehensive explanation, see How does Automated Analytics do it? The magic behind creating predictive models automatically).
The lift curves in the gain chart below depicts different kinds of model performance for explaining KI and KR. If we ignore the risk score of every employee and randomly arrange them is a list, then we can expect to detect 30% of the employees at risk when we processed 30% of the employees. This is indicated by the red line labeled as Random. On the other hand, if we have a perfect model, which knows exactly which employee is at risk of leaving, then we can focus on the selected group, and ignore all other employee in the dataset. This is indicated by the green line, where there are about 25% of employees at risk of leaving.
During the training phase, the performance of the model is indicated by the yellow line, which estimates the percentage of the employees at risk it will detect. For example, at 30% of the training dataset, it can detect 80% of the employees at risk, which is 167% better than a random model described above. During the test phase, the performance of the model is indicated by the blue line, which validates the percentage of the employees at risk. The gap between the training and test lines represent the robustness of the model. The smaller the gap between the yellow line and the blue line, more robust the model. This gap is indicated by the yellow shaded area B.
How we apply predictive analysis: informed decision making
The use case shown below can be used to illustrate the performance result of the predictive model. Without a predictive model, we do not know the flight risk score of each employee. Consequently, we are not able to make informed decision about employee at risk. For example, imagine there are 10 employees, six of whom are happy in their current job and four of whom are at flight risk based on salary. However, we do not know which of the 10 employees are flight risk. Suppose it will cost $5,000 additional salary to keep each employee from leaving for a better paying job elsewhere. Further assume that there is a fixed budget of $25,000 for preventing employee flight risk. In this case, we can spend the amount on 5 of 10 (50%) randomly selected employees to prevent half of the employee flight risk and hope we are spending it on the right people. This is depicted in the table below, where two of the employees at risk – #2 and #3, can be stopped, while the other two employees at risk – #7 and #8 cannot be stopped.
On the other hand, if we assign the employee flight risk scores to each employee with the predictive model, we can make more informed decision to prevent employee flight risk. As shown in the table below, employees are ranked by flight risk score, where employees with the highest flight risk are listed at the top of the list. If the same budget is applied to prevent employee flight risk, all four employees at risk can be stopped – a 100% improvement over the random selection.
This analysis is summarized in the gain chart below. The x-axis represents the cost of stopping employee flight risk, and the y-axis representing the number employees at risk we want to stop. As indicated by the gray line labeled as Random, if we have $50,000, all four employees at risk can be stopped; however, with a $25,000 budget, only 2 employees at risk can be stopped. On the other hand, using a predictive model, $25,000 can stop all 4 employees at risk, a 100% cost saving over random selection.
WFA has evolved from helping workforce analytics users to analyze workforce issues on what happened and for how many times, to analyze why they happened and how to make informed decision. Turnover and employee flight risk is such as a workforce issue that is in the minds of many managers, and the answer to resolving this issue can be found in the historical data using predictive analytics.
WFA embeds predictive analytics with SAP Predictive Analytics in an application to automate the creation of the predictive model. It conceptualized the flight risk issue as a classification problem and uses a widely-used machine learning algorithm – classification and regression, to understand the subject and to predict flight risk at the individual employee level.
The WFA application uses predictive power to measure explain-ability of the flight risk drivers and prediction confidence to measure the trustworthiness of the predictive model. The flight risk drivers can be operationalized for mitigating flight risk and made available to the managers/HRBP to proactively mitigate turnover and employee flight risk.
The author would like to thank Steven Hunt for reviewing the draft of this article, and for making thoughtful suggestions to the content of this article.