This example uses customer data from a bank to build a predictive model for the likely churn clients.

As we know, it is much more expensive to sign in a new client than to keep an existing one. It is advantageous for banks to know what leads clients to leave the company.

Churn prevention allows companies to develop loyalty programs and retention campaigns to keep as many customers as possible.

Contents

  1. Application type.
  2. Data set.
  3. Neural network.
  4. Training strategy.
  5. Model selection.
  6. Testing analysis.
  7. Model deployment.

 

This example is solved with Neural Designer. To follow it step by step, you can use the free trial.

1. Application type

The variable to be predicted is binary (churn or loyal). Therefore this is a classification project.

The goal here is to model churn probability, conditioned on the customer features.

2. Data set

The data set contains information for creating our model. We need to configure three things here:

  • Data source.
  • Variables.
  • Instances.

 

The data file bank_churn.csv contains 12 features about 10000 clients of the bank.

The features or variables are the following:

  • customer_id, unused variable.
  • credit_score, used as input.
  • country, used as input.
  • gender, used as input.
  • age, used as input.
  • tenure, used as input.
  • balance, used as input.
  • products_number, used as input.
  • credit_card, used as input.
  • active_member, used as input.
  • estimated_salary, used as input.
  • churn, used as the target. 1 if the client has left the bank during some period or 0 if he/she has not.

 

On the other hand, the instances are split at random into training (60%), selection (20%), and testing (20%) subsets.

Once the variables and instances are configured, we can perform some analytics on the data.

The data distributions tell us the percentages of churn and loyal customers.

In this data set, the percentage of churn customers is about 20%.

The inputs-targets correlations might indicate which variables might be causing attrition.

From the above chart, we can see that the country has a significant influence and that older customers have more probability of leaving the bank.

3. Neural network

The second step is to choose a neural network to represent the classification function. For classification problems it is composed of:

 

For the scaling layer, the minimum and maximum scaling methods are set.

We set one perceptron layer, with 3 neurons as a first guess, having the logistic activation function.

The next figure is a diagram for the neural network used in this example.

4. Training strategy

The training strategy is applied to the neural network to obtain the best possible performance. It is composed of two things:

  • A loss index.
  • An optimization algorithm.

 

The selected loss index is the weighted squared error with L2 regularization. The weighted squared error is helpful in applications where the targets are unbalanced. It gives a weight of 3.91 to churn customers and 1 to loyal customers.

The selected optimization algorithm is the quasi-Newton method.

The following chart shows how the training (blue) and selection (orange) errors decrease with the training epochs.

The final training and selection errors are training error = 0.621 WSE and selection error = 0.656 WSE, respectively. The following section will improve the generalization performance by reducing the selection error.

5. Model selection

Order selection is used to find the complexity of the neural network that optimizes the generalization performance. That is the number of neurons that minimize the error in the selection instances.

The following chart shows the training and selection errors for each different order after performing the incremental order method.

As the chart shows, the optimal number of neurons is 6, with selection error = 0.643.

Input selection (or feature selection) is used to find the set of inputs that produce the best generalization. The genetic algorithm has been applied here, but it does not reduce the selection error value, so we leave all input variables.

The following figure shows the final network architecture for this application.

6. Testing analysis

The next step is to perform an exhaustive testing analysis to validate the neural network’s predictive capabilities.

A good measure for the precision of a binary classification model is the ROC curve.

We are interested in the area under the curve (AUC). A perfect classifier would have an AUC=1, and a random one would have AUC=0.5. Our model has an AUC = 0.874, which is great.

We can also look at the confusion matrix. Next, we show the elements of this matrix for a decision threshold = 0.5.

Predicted positive Predicted negative
Real positive 305 (15%) 80 (4%)
Real negative 344 (17%) 1271 (63%)

From the above confusion matrix, we can calculate the following binary classification tests:

  • Classification accuracy: 78.8% (ratio of correctly classified samples).
  • Error rate: 21.2% (ratio of misclassified samples).
  • Sensitivity: 79.2% (percentage of actual positive classified as positive).
  • Specificity: 78.7% (percentage of actual negative classified as negative).

 

Now, we can simulate the performance of a retention campaign. For that, we use the cumulative gain chart.

The above chart tells us that if we contact 25% of the customers with the highest chance of churn, we will reach 75% of the customers leaving the bank.

7. Model deployment

Once we have tested the churn model, we can use it to evaluate the probability of churn of our customers.

For instance, consider a customer with the following features:

  • credit_score: 650
  • country: France
  • gender: Female
  • age: 39
  • tenure: 5
  • balance: 76485
  • products_number: 2
  • credit_card: Yes
  • active_member: No
  • estimated_salary: 100000

 

The probability of churn for that customer is 38%.

We can export the mathematical expression of the neural network to any bank software to facilitate the work of the Retention Department. This expression is listed below.

scaled_credit_score = (credit_score-650.529)/96.6533;
scaled_France = 2*(France-0)/(1-0)-1;
scaled_Spain = (Spain-0.2477)/0.431698;
scaled_Germany = (Germany-0.2509)/0.433553;
scaled_gender = 2*(gender-0)/(1-0)-1;
scaled_age = (age-38.9218)/10.4878;
scaled_tenure = 2*(tenure-0)/(10-0)-1;
scaled_balance = (balance-76485.9)/62397.4;
scaled_products_number = (products_number-1.5302)/0.581654;
scaled_credit_card = 2*(credit_card-0)/(1-0)-1;
scaled_active_member = 2*(active_member-0)/(1-0)-1;
scaled_estimated_salary = 2*(estimated_salary-11.58)/(199992-11.58)-1;
y_1_1 = Logistic (0.848205+ (scaled_credit_score*-0.608944)+ (scaled_France*-0.261025)+ (scaled_Spain*0.412236)+ (scaled_Germany*-0.102466)+ (scaled_gender*-0.190523)+ (scaled_age*-5.79629)+ (scaled_tenure*-0.538913)+ (scaled_balance*-0.442531)+ (scaled_products_number*-2.72944)+ (scaled_credit_card*0.684301)+ (scaled_active_member*3.1411)+ (scaled_estimated_salary*1.5462));
y_1_2 = Logistic (-0.30529+ (scaled_credit_score*0.0542391)+ (scaled_France*-0.0197414)+ (scaled_Spain*-0.277012)+ (scaled_Germany*0.287529)+ (scaled_gender*-0.138025)+ (scaled_age*-1.67199)+ (scaled_tenure*-0.295799)+ (scaled_balance*-0.0519641)+ (scaled_products_number*-5.95291)+ (scaled_credit_card*-0.214941)+ (scaled_active_member*-1.43624)+ (scaled_estimated_salary*0.198904));
y_1_3 = Logistic (-0.0481312+ (scaled_credit_score*0.25511)+ (scaled_France*0.0844269)+ (scaled_Spain*0.108521)+ (scaled_Germany*-0.2049)+ (scaled_gender*0.125926)+ (scaled_age*0.0827378)+ (scaled_tenure*0.276278)+ (scaled_balance*-0.489973)+ (scaled_products_number*-0.776123)+ (scaled_credit_card*0.0203207)+ (scaled_active_member*0.525674)+ (scaled_estimated_salary*-0.17605));
y_1_4 = Logistic (1.52953+ (scaled_credit_score*-3.07592)+ (scaled_France*1.09842)+ (scaled_Spain*-1.4286)+ (scaled_Germany*0.153036)+ (scaled_gender*1.71313)+ (scaled_age*2.61432)+ (scaled_tenure*-3.80362)+ (scaled_balance*0.78056)+ (scaled_products_number*-1.88)+ (scaled_credit_card*-1.82242)+ (scaled_active_member*1.85776)+ (scaled_estimated_salary*1.40538));
y_1_5 = Logistic (-0.0116541+ (scaled_credit_score*0.144119)+ (scaled_France*-0.0170994)+ (scaled_Spain*0.0812705)+ (scaled_Germany*-0.0603271)+ (scaled_gender*-0.0485258)+ (scaled_age*-1.6572)+ (scaled_tenure*0.0583053)+ (scaled_balance*-0.135168)+ (scaled_products_number*-1.32794)+ (scaled_credit_card*0.0531906)+ (scaled_active_member*-1.13656)+ (scaled_estimated_salary*-0.128869));
y_1_6 = Logistic (-3.85516+ (scaled_credit_score*-0.0138554)+ (scaled_France*-0.753416)+ (scaled_Spain*-1.04647)+ (scaled_Germany*1.90095)+ (scaled_gender*0.0137635)+ (scaled_age*-0.191778)+ (scaled_tenure*0.343281)+ (scaled_balance*4.70446)+ (scaled_products_number*-6.3796)+ (scaled_credit_card*0.115022)+ (scaled_active_member*-0.153162)+ (scaled_estimated_salary*-0.0731349));
non_probabilistic_churn = Logistic (4.33579+ (y_1_1*-1.60163)+ (y_1_2*7.91345)+ (y_1_3*-6.65044)+ (y_1_4*-1.39552)+ (y_1_5*-5.56462)+ (y_1_6*-2.54043));
churn = probability(non_probabilistic_churn);

logistic(x){
   return 1/(1+exp(-x))
}

probability(x){
   if x < 0 return 0 else if x > 1
       return 1
   else
       return x
}

Related posts