Machine learning examples

Predict the probability of default of credit card clients

The objective of this example is to predict customer's default payments in a bank. The main outcome of this project is to reduce loan losses, but real time scoring and limits monitoring are also achieved.

This example aims at the case of customer's default payments in a bank. From the perspective of risk management, the result of the predictive model of the probability of default will be more valuable than the binary result of classification - credible or not credible clients.

The credit risk database used here is related with consumers' default payments in Taiwan.

Contents:

  1. Application type.
  2. Data set.
  3. Neural network.
  4. Training strategy.
  5. Model selection.
  6. Testing analysis.
  7. Model deployment.

1. Application type

This is a classification project, since the variable to be predicted is binary (default or not).

The goal here is to model the probability of default, as a function of the customer features.

2. Data set

The data set is is composed of four concepts:

The data file credit_risk.csv, contains the information used to create the model. It consists of 30000 rows and 25 columns. The columns represent the variables, the rows represent the instances.

This data set uses the following 23 variables:

On the other hand, the instances are divided at random into training, validation and testing subsets, containing 60%, 20% and 20% of the instances, respectively. More specifically, 18000 samples are used here for training, 6000 for selection and 6000 for testing.

Once the data set is configured, we can calculate the data distribution of the variables. The next figure depicts the numbers of customers that do and do not repay the loan.

As we can observe, the data is unbalanced, this information will later be used to configure the neural network.

The next figure depicts the inputs-targets correlations of all the inputs with the target. This helps us to see the influence of the different inputs on the default.

3. Neural network

The second step is to choose a neural network to represent the classification function. For classification problems, it is composed by:

For the scaling layer, the mean and standard deviation scaling method is set.

We set 2 perceptron layers, one hidden layer with 3 neurons as a first guess and one output layer with 1 neuron, both layers having the logistic activation function.

At last, we will set the continuous probabilistic method for the probabilistic layer.

The next figure is a diagram for the neural network used in this example.

4. Training strategy

The fourth step is to configure the training strategy, which is composed of two concepts:

The error term is the weighted squared error. It weights the squared error of negatives and positives values. If the weighted squared error has a value of unity then the neural network is predicting the data 'in the mean', while a value of zero means perfect prediction of the data.

In this case, the neural parameters norm weight term is 0.01. This parameter makes the model to be stable, avoiding oscillations.

The optimization algorithm is applied to the neural network to get the best performance. The chosen algorithm here is the quasi-Newton method and we leave the default parameters.

The following chart shows how the training and selection errors decrease with the epochs during the training process. The final results are training error = 0.755 WSE and selection error = 0.802 WSE, respectively.

5. Model selection

The objective of model selection is to find the network architecture with best generalization properties, that is, that which minimizes the error on the selection instances of the data set.

More specifically, we want to find a neural network with a selection error less than 0.802 WSE, which is the value that we have achieved so far.

Order selection algorithms train several network architectures with different number of neurons and select that with the smallest selection error.

The incremental order method starts with a small number of neurons and increases the complexity at each iteration. The following chart shows the training error (blue) and the selection error (orange) as a function of the number of neurons.

The final selection error achieved is 0.801 for an optimal number of neurons of 3.

The graph above represents the architecture of the final neural network.

6. Testing analysis

The next step is to evaluate the performance of the trained neural network by an exhausttive testing analysis. The standard way to do this is to compare the outputs of the neural network against data never seen before, the training instances.

A common method to measure the generalization performance is the ROC curve. This is a graphical aid to study the capacity of discrimination of the classifier. One of the parameters that can be obtained from this chart is the area under the curve (AUC). The closer to 1 area under curve, the better the classifier. The next figure shows this measure for this example.

In this case, the AUC takes a high value: AUC = 0.772.

The binary classification tests and the confusion matrix give us very useful information about the performance of our predictive model. Below, both of these are displayed.

Predicted positive Predicted negative
Real positive 745 (12.4%) 535 (8.92%)
Real negative 893 (14.9%) 3287 (63.8%)

The classification accuracy takes a high value (76.2%), which means that the prediction is good for a large amount of the cases.

7. Model deployment

Once the generalization performance of the neural network has been tested, the neural network can be saved for future use in the so-called model deployment mode.

Below, the mathematical expression for the present model is displayed.

scaled_limit_balance = (limit_balance-167484)/129748;
scaled_gender = (gender-0.396267)/0.489129;
scaled_university = (university-0.473107)/0.499285;
scaled_graduate_school = (graduate_school-0.356938)/0.479104;
scaled_high_school = (high_school-0.165807)/0.371913;
scaled_others = (others-0.0041477)/0.06427;
scaled_married = (married-0.456121)/0.498079;
scaled_single = 2*(single-0)/(1-0)-1;
scaled_others_1 = (others_1-0.0107861)/0.103296;
scaled_repayment_status_lag_1 = (repayment_status_lag_1+0.0167)/1.1238;
scaled_repayment_status_lag_2 = (repayment_status_lag_2+0.133767)/1.19719;
scaled_repayment_status_lag_3 = (repayment_status_lag_3+0.1662)/1.19687;
scaled_repayment_status_lag_4 = (repayment_status_lag_4+0.220667)/1.16914;
scaled_repayment_status_lag_5 = (repayment_status_lag_5+0.2662)/1.13319;
scaled_repayment_status_lag_6 = (repayment_status_lag_6+0.2911)/1.14999;
scaled_payment_amount_lag_1 = (payment_amount_lag_1-5663.58)/16563.3;
scaled_payment_amount_lag_2 = (payment_amount_lag_2-5921.16)/23040.9;
scaled_payment_amount_lag_3 = (payment_amount_lag_3-5225.68)/17607;
scaled_payment_amount_lag_4 = (payment_amount_lag_4-4826.08)/15666.2;
scaled_payment_amount_lag_5 = (payment_amount_lag_5-4799.39)/15278.3;
scaled_payment_amount_lag_6 = (payment_amount_lag_6-5215.5)/17777.5;
y_1_1 = Logistic (2.17523+ (scaled_limit_balance*0.0536765)+ (scaled_gender*-0.152476)+ (scaled_university*-0.0121881)+ (scaled_graduate_school*-0.0306387)+ (scaled_high_school*0.173186)+ (scaled_others*-0.73854)+ (scaled_married*0.320037)+ (scaled_single*0.416477)+ (scaled_others_1*0.0952706)+ (scaled_repayment_status_lag_1*3.09033)+ (scaled_repayment_status_lag_2*0.680669)+ (scaled_repayment_status_lag_3*0.544463)+ (scaled_repayment_status_lag_4*0.287801)+ (scaled_repayment_status_lag_5*0.250001)+ (scaled_repayment_status_lag_6*0.642499)+ (scaled_payment_amount_lag_1*1.12996)+ (scaled_payment_amount_lag_2*0.314234)+ (scaled_payment_amount_lag_3*0.413271)+ (scaled_payment_amount_lag_4*0.0778191)+ (scaled_payment_amount_lag_5*0.219943)+ (scaled_payment_amount_lag_6*0.11853));
y_1_2 = Logistic (-0.497663+ (scaled_limit_balance*-0.335962)+ (scaled_gender*0.113687)+ (scaled_university*0.0570722)+ (scaled_graduate_school*-0.00269284)+ (scaled_high_school*0.00829173)+ (scaled_others*-0.36325)+ (scaled_married*-0.320526)+ (scaled_single*-0.409095)+ (scaled_others_1*-0.0420963)+ (scaled_repayment_status_lag_1*0.415076)+ (scaled_repayment_status_lag_2*0.706712)+ (scaled_repayment_status_lag_3*0.576033)+ (scaled_repayment_status_lag_4*-0.0251456)+ (scaled_repayment_status_lag_5*0.354822)+ (scaled_repayment_status_lag_6*0.0151421)+ (scaled_payment_amount_lag_1*0.0512893)+ (scaled_payment_amount_lag_2*0.000381215)+ (scaled_payment_amount_lag_3*0.13789)+ (scaled_payment_amount_lag_4*0.0193147)+ (scaled_payment_amount_lag_5*0.0986576)+ (scaled_payment_amount_lag_6*0.0923033));
y_1_3 = Logistic (1.98096+ (scaled_limit_balance*-0.268353)+ (scaled_gender*0.132924)+ (scaled_university*-0.158693)+ (scaled_graduate_school*-0.214737)+ (scaled_high_school*-0.273348)+ (scaled_others*0.559927)+ (scaled_married*0.0702353)+ (scaled_single*-0.0169948)+ (scaled_others_1*0.0913319)+ (scaled_repayment_status_lag_1*-1.64665)+ (scaled_repayment_status_lag_2*1.20179)+ (scaled_repayment_status_lag_3*0.865566)+ (scaled_repayment_status_lag_4*-0.733056)+ (scaled_repayment_status_lag_5*0.694502)+ (scaled_repayment_status_lag_6*-0.444317)+ (scaled_payment_amount_lag_1*0.402657)+ (scaled_payment_amount_lag_2*0.478144)+ (scaled_payment_amount_lag_3*0.998816)+ (scaled_payment_amount_lag_4*0.0466331)+ (scaled_payment_amount_lag_5*0.831399)+ (scaled_payment_amount_lag_6*0.261382));
non_probabilistic_default = Logistic (2.09934+ (y_1_1*-2.74891)+ (y_1_2*5.68963)+ (y_1_3*-3.45925));
default = probability(non_probabilistic_default);

logistic(x){
   return 1/(1+exp(-x))
}

probability(x){
   if x < 0
       return 0
   else if x > 1
       return 1
   else
       return x
}
        

This expression can be exported elsewhere, for example, a software used in the bank to implement the prediction of these values.

References:

Related examples:

Related solutions: