One of the main problems of companies and human resources departments is employee churn. This phenomenon can be very expensive. Indeed, the cost of retaining an existing employee is far less than acquiring a new one.

Employee churn prevention aims to predict who, when, and why employees will terminate their jobs.

Accurate methods that identify which employees are more likely to switch to another company are needed. They would allow to adapt those specific aspects of the organization needed to prevent attrition and, therefore, reduce costs.

The objective is to effectively untangle all the factors that lead to employee attrition, and to determine the underlying causes, to prevent it. But analyzing multiple personal and social factors is complicated, to say the least. We need both rich employee data, along with complex predictive models to analyze it.

This is a classification project, since the variable to be predicted is binary (attrition or not).

The goal here is to model the probability of attrition, conditioned on the employee features.

The data set used in this study contains quantitative and qualitative information about a sample of employees at the company. The data set contains about 1,500 employees. For each, around 35 personal, professional and socio-economical attributes will be selected as the input variables.

More specifically, the variables of this example are:

**age****business_travel****daily_rate****department****distance_from_home****education****education_field****employee_count****employee_number****environment_satisfaction****gender****hourly_rate****job_involvement****job_level****job_role****job satisfaction****marital_status****monthly_income****monthly_rate****number_companies_worked****over_18****over_time****percent_salary_hike****performance_rating****relationship_satisfaction****standard_hours****stock_option_level****total_working_years****training_times_last_year****work_life_balance****years_at_company****years_in_current_role****years_since_last_promotion****years_with_current_manager****attrition**: satisfaction of the worker with the company (loyal or attrition).

As we can see, we have a total of 48 inputs, which contain the characteristics of every employee, 1 target, which is the variable "Attrition" mentioned before. There are 3 unused variables ("EmployeeCount", "Over18" and "StandardHours"), which are constant and will be set as unused variables for the analysis since they do not provide any valuable information.

Before starting the predictive analysis, it is also important to know the ratio of negative and positive instances that we have in the data set.

The chart shows that the data is unbalanced, the number of negative instances (1233) is much larger that the number of positive instances (237). We use this information later to design properly the predictive model.

The inputs-targets correlations analyze the dependencies between each input variable and the target.

As we can see, the input variables that have more importance with the attrition are "OverTime" (0.246118), "TotalWorkingYears" (0.22332) and "YearsAtCompany" (0.196728) while the ones with the least importance are "HourlyRate" (0.00678), "PerformanceRating" (0.00289) and "Research Scientist" (0.00036).

The neural network takes all the attributes of each of the employees and it will transform them into a probability of attrition.

For that purpose, we use a neural network with 48 inputs, one hidden layer with one neuron in it and one output. Besides, we will use 3 neurons in the hidden layer as a first guess.

The minimum and maximum scaling method is chosen as the scaling method and for the probabilistic layer the continuous probabilistic method is set.

The next step is to select an appropriate training strategy, which defines what the neural network will learn. A general training strategy is composed of two concepts:

- A loss index.
- An optimization algorithm.

As we said before, the data set is unbalanced. As a consequence, we set as error method the weighted squared error. with the positive and negative weights shown in the next table. Positives weight: 5.20, negatives weight: 1

We use the quasi-Newton method as optimization algorithm.

Now, the model is ready to be trained. The next chart shows how the training and selection errors decrease with the epochs of the optimization algorithm.

The final training and selection errors are
**training error = 0.285 WSE** and **selection error = 0.931 WSE**, respectively.

The objective of model selection is to find the network architecture with best generalization properties, that is, that which minimizes the error on the selection instances of the data set.

More specifically, we want to find a neural network with a selection error less than **0.931 WSE**,
which is the value that we have achieved so far.

Order selection algorithms train several network architectures with different number of neurons and select that with the smallest selection error.

The incremental order method starts with a small number of neurons and increases the complexity at each iteration. The following chart shows the training error (blue) and the selection error (orange) as a function of the number of neurons.

As we can see, the optimal number of neurons in the hidden layer is 1, resulting in a order selection error of **0.614 WSE**,
which is far better than the previous one.

testing analysis assesses the quality of the model to decide if it is ready to be use in the production phase, i.e., in a real world situation.

The way to test the model will be comparing the outputs of the trained neural network against the real targets for a set of data that has not been used neither for training nor for selection, the testing subset. For that purpose, we make use of some testing methods commonly used in binary classification problems.

The ROC curve measures the discrimination capacity of the classifier between positives and negatives instances. For a perfect classifier, the ROC curve should pass through the upper left corner. The next chart shows the ROC curve for our problem.

The closer the area under curve to 1, the better the classifier. In this case, the area takes the value **0.804**
which confirms what we saw before in the ROC chart, that the model is predicting attrition with great accuracy.

For classification models with binary target variable, constructing the confusion matrix is also a good task to test the model. Below this table is displayed.

Predicted positive | Predicted negative | |
---|---|---|

Real positive | 35 (11.9%) | 16 (5.44%) |

Real negative | 47 (16%) | 196 (66.7%) |

The next list depicts the binary classification tests. They are calculated from the values of the confusion matrix.

**Classification accuracy: 78.6%**(ratio of correctly classified samples).**Error rate: 21.4%**(ratio of misclassified samples).**Sensitivity: 68.6%**(percentage of actual positive classified as positive).**Specificity: 80.6%**(percentage of actual negative classified as negative).

In general, these binary classification tests show a good performance of the predictive model. Nevertheless, it is important to highlight that this model has greater sepecificity than sensitivity, showing that it works better when accurately detecting negative instances.

Once we know that the model can predict employee attrition accurately, it can be used to evaluate the satisfaction of a given employee with the company. The predictive model also gives us the factors which are more significant for a given employee, which allows the company to act on that variables.

The predictive model takes the form of a function of the outputs with respect to the inputs. The mathematical expression, which is listed below, can be embedded into any software.

scaled_age = 2*(age-18)/(60-18)-1; scaled_business_travel = 2*(business_travel-0)/(2-0)-1; scaled_daily_rate = 2*(daily_rate-102)/(1499-102)-1; scaled_Sales = 2*(Sales-0)/(1-0)-1; scaled_Research&Development = 2*(Research&Development-0)/(1-0)-1; scaled_HumanResources = 2*(HumanResources-0)/(1-0)-1; scaled_distance_from_home = 2*(distance_from_home-1)/(29-1)-1; scaled_education = 2*(education-1)/(5-1)-1; scaled_LifeSciences = 2*(LifeSciences-0)/(1-0)-1; scaled_Other = 2*(Other-0)/(1-0)-1; scaled_Medical = 2*(Medical-0)/(1-0)-1; scaled_Marketing = 2*(Marketing-0)/(1-0)-1; scaled_TechnicalDegree = 2*(TechnicalDegree-0)/(1-0)-1; scaled_HumanResources_1 = 2*(HumanResources_1-0)/(1-0)-1; scaled_employee_number = 2*(employee_number-1)/(2068-1)-1; scaled_environment_satisfaction = 2*(environment_satisfaction-1)/(4-1)-1; scaled_gender = 2*(gender-0)/(1-0)-1; scaled_hourly_rate = 2*(hourly_rate-30)/(100-30)-1; scaled_job_involvement = 2*(job_involvement-1)/(4-1)-1; scaled_job_level = 2*(job_level-1)/(5-1)-1; scaled_SalesExecutive = 2*(SalesExecutive-0)/(1-0)-1; scaled_ResearchScientist = 2*(ResearchScientist-0)/(1-0)-1; scaled_LaboratoryTechnician = 2*(LaboratoryTechnician-0)/(1-0)-1; scaled_ManufacturingDirector = 2*(ManufacturingDirector-0)/(1-0)-1; scaled_HealthcareRepresentative = 2*(HealthcareRepresentative-0)/(1-0)-1; scaled_Manager = 2*(Manager-0)/(1-0)-1; scaled_SalesRepresentative = 2*(SalesRepresentative-0)/(1-0)-1; scaled_ResearchDirector = 2*(ResearchDirector-0)/(1-0)-1; scaled_HumanResources_1 = 2*(HumanResources_1-0)/(1-0)-1; scaled_job_satisfaction = 2*(job_satisfaction-1)/(4-1)-1; scaled_Single = 2*(Single-0)/(1-0)-1; scaled_Married = 2*(Married-0)/(1-0)-1; scaled_Divorced = 2*(Divorced-0)/(1-0)-1; scaled_monthly_income = 2*(monthly_income-1009)/(19999-1009)-1; scaled_monthly_rate = 2*(monthly_rate-2094)/(26999-2094)-1; scaled_num_companies_worked = 2*(num_companies_worked-0)/(9-0)-1; scaled_over_time = 2*(over_time-0)/(1-0)-1; scaled_percent_salary_hike = 2*(percent_salary_hike-11)/(25-11)-1; scaled_performance_rating = 2*(performance_rating-3)/(4-3)-1; scaled_relationship_satisfaction = 2*(relationship_satisfaction-1)/(4-1)-1; scaled_stock_option_level = 2*(stock_option_level-0)/(3-0)-1; scaled_total_working_years = 2*(total_working_years-0)/(40-0)-1; scaled_training_times_last_year = 2*(training_times_last_year-0)/(6-0)-1; scaled_work_life_balance = 2*(work_life_balance-1)/(4-1)-1; scaled_years_at_company = 2*(years_at_company-0)/(40-0)-1; scaled_years_in_current_role = 2*(years_in_current_role-0)/(18-0)-1; scaled_years_since_last_promotion = 2*(years_since_last_promotion-0)/(15-0)-1; scaled_years_with_curr_manager = 2*(years_with_curr_manager-0)/(17-0)-1; y_1_1 = Logistic (-0.132196+ (scaled_age*-1.61431)+ (scaled_business_travel*1.60471)+ (scaled_daily_rate*0.246393)+ (scaled_Sales*0.907402)+ (scaled_Research&Development*-0.517518)+ (scaled_HumanResources*0.313808)+ (scaled_distance_from_home*0.945042)+ (scaled_education*-0.754642)+ (scaled_LifeSciences*-0.577821)+ (scaled_Other*-0.498823)+ (scaled_Medical*-0.224641)+ (scaled_Marketing*0.118109)+ (scaled_TechnicalDegree*1.09709)+ (scaled_HumanResources_1*1.10928)+ (scaled_employee_number*0.50999)+ (scaled_environment_satisfaction*-1.21089)+ (scaled_gender*0.608194)+ (scaled_hourly_rate*0.414471)+ (scaled_job_involvement*-1.53346)+ (scaled_job_level*-1.14007)+ (scaled_SalesExecutive*0.108099)+ (scaled_ResearchScientist*1.39005)+ (scaled_LaboratoryTechnician*2.09738)+ (scaled_ManufacturingDirector*-1.39253)+ (scaled_HealthcareRepresentative*-0.342303)+ (scaled_Manager*-0.823216)+ (scaled_SalesRepresentative*1.33255)+ (scaled_ResearchDirector*-0.626333)+ (scaled_HumanResources_1*-0.0752408)+ (scaled_job_satisfaction*-1.37811)+ (scaled_Single*1.44477)+ (scaled_Married*-0.171722)+ (scaled_Divorced*-0.227707)+ (scaled_monthly_income*-1.23993)+ (scaled_monthly_rate*-0.106072)+ (scaled_num_companies_worked*1.49945)+ (scaled_over_time*2.04229)+ (scaled_percent_salary_hike*0.58747)+ (scaled_performance_rating*-0.4962)+ (scaled_relationship_satisfaction*-1.02995)+ (scaled_stock_option_level*-1.16647)+ (scaled_total_working_years*-0.232614)+ (scaled_training_times_last_year*-0.385595)+ (scaled_work_life_balance*-1.80506)+ (scaled_years_at_company*-0.778416)+ (scaled_years_in_current_role*-0.59614)+ (scaled_years_since_last_promotion*4.21654)+ (scaled_years_with_curr_manager*-3.53303)); non_probabilistic_attrition = Logistic (-1.78806+ (y_1_1*4.59128)); attrition = probability(non_probabilistic_attrition); logistic(x){ return 1/(1+exp(-x)) } probability(x){ if x < 0 return 0 else if x > 1 return 1 else return x }