# Predict the probability of default of credit card clients using Neural Designer

The objective of this example is to predict customer's default payments in a bank. This project's primary outcome is to reduce loan losses, but real-time scoring and limits monitoring are also achieved.

This example aims at the case of a customer's default payments in a bank. From the perspective of risk management, the result of the predictive model of the probability of default will be more valuable than the binary result of classification - credible or not credible clients.

The credit risk database used here is related to consumers' default payments in Taiwan. ### Contents:

This example is solved with Neural Designer. To follow it step by step, you can use the free trial.

## 1. Application type

This is a classification project, since the variable to be predicted is binary (default or not).

The goal here is to model the probability of default as a function of the customer features.

## 2. Data set

The data set is composed of four concepts:

• Data source.
• Variables.
• Instances.
• Missing values.

The data file credit_risk.csv contains the information used to create the model. It consists of 30000 rows and 25 columns. The columns represent the variables, while the rows represent the instances.

This data set uses the following 23 variables:

• limit_balance: Amount of the given credit (NT dollar): it includes both the individual consumer credit and his/her family (supplementary) credit.
• sex: Gender (1= male; 2= female).
• education_level: Education (1 = graduate school; 2 = university; 3 = high school; 4 = others).
• marital_status: Marital status (1 = married; 2 = single; 3 = others).
• age: Age (year).
• repayment_status_lag_1: Repayment status 1 month ago (-1 = pay duly; 1 = payment delay for one month; ... ; 9 = payment delay for nine months and above).
• repayment_status_lag_2: Repayment status 2 months ago ( - 1 = pay duly; 1 = payment delay for one month; ... ; 9 = payment delay for nine months and above).
• repayment_status_lag_3: Repayment status 3 months ago ( - 1 = pay duly; 1 = payment delay for one month; ... ; 9 = payment delay for nine months and above).
• repayment_status_lag_4: Repayment status 4 months ago ( - 1 = pay duly; 1 = payment delay for one month; ... ; 9 = payment delay for nine months and above).
• repayment_status_lag_5: Repayment status 5 months ago ( - 1 = pay duly; 1 = payment delay for one month; ... ; 9 = payment delay for nine months and above).
• repayment_status_lag_6: Repayment status 6 months ago ( - 1 = pay duly; 1 = payment delay for one month; ... ; 9 = payment delay for nine months and above).
• bill_state_amount_lag_1: Amount of bill statement 1 month ago (NT dollar).
• bill_state_amount_lag_2: Amount of bill statement 2 months ago (NT dollar).
• bill_state_amount_lag_3: Amount of bill statement 3 months ago (NT dollar).
• bill_state_amount_lag_4: Amount of bill statement 4 months ago (NT dollar).
• bill_state_amount_lag_5: Amount of bill statement 5 months ago (NT dollar).
• bill_state_amount_lag_6: Amount of bill statement 6 months ago (NT dollar).
• payment_amount_lag_1: Amount paid 1 month ago (NT dollar).
• payment_amount_lag_2: Amount paid 2 month ago (NT dollar).
• payment_amount_lag_3: Amount paid 3 month ago (NT dollar).
• payment_amount_lag_4: Amount paid 4 month ago (NT dollar).
• payment_amount_lag_5: Amount paid 5 month ago (NT dollar).
• payment_amount_lag_6: Amount paid 6 month ago (NT dollar).
• default: Failure to repay the loan.

On the other hand, the instances are divided at random into training, validation, and testing subsets, containing 60%, 20%, and 20% of the instances, respectively. More specifically, 18000 samples are used here for training, 6000 for selection and 6000 for testing.

Once the data set is configured, we can calculate the data distribution of the variables. The next figure depicts the number of customers that do and do not repay the loan. As we can observe, the data is unbalanced; this information will be used to configure the neural network later.

The next figure depicts the inputs-targets correlations of all the inputs with the target. This helps us to see the influence of the different inputs on the default. ## 3. Neural network

The second step is to choose a neural network to represent the classification function. For classification problems, it is composed of:

For the scaling layer, the mean and standard deviation scaling method is set.

We set 2 perceptron layers, one hidden layer with 3 neurons as a first guess and one output layer with 1 neuron, both layers having the logistic activation function.

At last, we will set the continuous probabilistic method for the probabilistic layer.

The next figure is a diagram for the neural network used in this example. ## 4. Training strategy

The fourth step is to configure the training strategy, which is composed of two concepts:

• A loss index.
• An optimization algorithm.

The error term is the weighted squared error. It weights the squared error of negatives and positives values. If the weighted squared error has a value of unity, then the neural network is predicting the data 'in the mean', while a value of zero means a perfect prediction of the data.

In this case, the neural parameters norm weight term is 0.01. This parameter makes the model to be stable, avoiding oscillations.

The optimization algorithm is applied to the neural network to get the best performance. The chosen algorithm here is the quasi-Newton method, and we leave the default parameters.

The following chart shows how the training and selection errors decrease with the epochs during the training process. The final results are training error = 0.755 WSE and selection error = 0.802 WSE, respectively. ## 5. Model selection

The objective of a model selection is to find the network architecture with the best generalization properties, that is, that which minimizes the error on the selected instances of the data set.

More specifically, we want to find a neural network with a selection error of less than 0.802 WSE, which is the value that we have achieved so far.

Order selection algorithms train several network architectures with a different number of neurons and select that with the smallest selection error.

The incremental order method starts with a small number of neurons and increases the complexity at each iteration. The following chart shows the training error (blue) and the selection error (orange) as a function of the number of neurons. The final selection error achieved is 0.801 for an optimal number of neurons of 3. The graph above represents the architecture of the final neural network.

## 6. Testing analysis

The next step is to evaluate the performance of the trained neural network by an exhaustive testing analysis. The standard way to do this is to compare the outputs of the neural network against data never seen before, the training instances.

A common method to measure the generalization performance is the ROC curve. This is a graphical aid to study the capacity of discrimination of the classifier. One of the parameters that can be obtained from this chart is the area under the curve (AUC). The closer to 1 area under the curve, the better the classifier. The next figure shows this measure for this example. In this case, the AUC takes a high value: AUC = 0.772.

The binary classification tests and the confusion matrix give us very useful information about our predictive model's performance. Below, both of these are displayed.

Predicted positive Predicted negative
Real positive 745 (12.4%) 535 (8.92%)
Real negative 893 (14.9%) 3287 (63.8%)
• Classification accuracy: 76.2% (ratio of correctly classified samples).
• Error rate: 23.8% (ratio of misclassified samples).
• Sensitivity: 58.2% (percentage of actual positive classified as positive).
• Specificity: 81.0% (percentage of actual negative classified as negative).

The classification accuracy takes a high value (76.2%), which means that the prediction is good for a large number of cases.

We can also perform the cumulative gain analysis which is a visual aid that shows the advantage of using a predictive model as opposed to randomness.

It consists of three lines. The baseline represents the results that would be obtained without using a model. The positive cumulative gain shows in the y-axis the percentage of positive instances found against the percentage of the population represented in the x-axis. Similarly, the negative cumulative gain shows the percentage of the negative instances found against the population percentage. In this case, by using the model, we see that by analyzing 50% of the clients with the higher probability of default, we would reach more than 75% of clients that will have default payments.

## 7. Model deployment

Once the generalization performance of the neural network has been tested, the neural network can be saved for future use in the so-called model deployment mode.

Below, the mathematical expression for the present model is displayed.

```scaled_limit_balance = (limit_balance-167484)/129748;
scaled_gender = (gender-0.396267)/0.489129;
scaled_university = (university-0.473107)/0.499285;
scaled_high_school = (high_school-0.165807)/0.371913;
scaled_others = (others-0.0041477)/0.06427;
scaled_married = (married-0.456121)/0.498079;
scaled_single = 2*(single-0)/(1-0)-1;
scaled_others_1 = (others_1-0.0107861)/0.103296;
scaled_repayment_status_lag_1 = (repayment_status_lag_1+0.0167)/1.1238;
scaled_repayment_status_lag_2 = (repayment_status_lag_2+0.133767)/1.19719;
scaled_repayment_status_lag_3 = (repayment_status_lag_3+0.1662)/1.19687;
scaled_repayment_status_lag_4 = (repayment_status_lag_4+0.220667)/1.16914;
scaled_repayment_status_lag_5 = (repayment_status_lag_5+0.2662)/1.13319;
scaled_repayment_status_lag_6 = (repayment_status_lag_6+0.2911)/1.14999;
scaled_payment_amount_lag_1 = (payment_amount_lag_1-5663.58)/16563.3;
scaled_payment_amount_lag_2 = (payment_amount_lag_2-5921.16)/23040.9;
scaled_payment_amount_lag_3 = (payment_amount_lag_3-5225.68)/17607;
scaled_payment_amount_lag_4 = (payment_amount_lag_4-4826.08)/15666.2;
scaled_payment_amount_lag_5 = (payment_amount_lag_5-4799.39)/15278.3;
scaled_payment_amount_lag_6 = (payment_amount_lag_6-5215.5)/17777.5;
y_1_1 = Logistic (2.17523+ (scaled_limit_balance*0.0536765)+ (scaled_gender*-0.152476)+ (scaled_university*-0.0121881)+ (scaled_graduate_school*-0.0306387)+ (scaled_high_school*0.173186)+ (scaled_others*-0.73854)+ (scaled_married*0.320037)+ (scaled_single*0.416477)+ (scaled_others_1*0.0952706)+ (scaled_repayment_status_lag_1*3.09033)+ (scaled_repayment_status_lag_2*0.680669)+ (scaled_repayment_status_lag_3*0.544463)+ (scaled_repayment_status_lag_4*0.287801)+ (scaled_repayment_status_lag_5*0.250001)+ (scaled_repayment_status_lag_6*0.642499)+ (scaled_payment_amount_lag_1*1.12996)+ (scaled_payment_amount_lag_2*0.314234)+ (scaled_payment_amount_lag_3*0.413271)+ (scaled_payment_amount_lag_4*0.0778191)+ (scaled_payment_amount_lag_5*0.219943)+ (scaled_payment_amount_lag_6*0.11853));
y_1_2 = Logistic (-0.497663+ (scaled_limit_balance*-0.335962)+ (scaled_gender*0.113687)+ (scaled_university*0.0570722)+ (scaled_graduate_school*-0.00269284)+ (scaled_high_school*0.00829173)+ (scaled_others*-0.36325)+ (scaled_married*-0.320526)+ (scaled_single*-0.409095)+ (scaled_others_1*-0.0420963)+ (scaled_repayment_status_lag_1*0.415076)+ (scaled_repayment_status_lag_2*0.706712)+ (scaled_repayment_status_lag_3*0.576033)+ (scaled_repayment_status_lag_4*-0.0251456)+ (scaled_repayment_status_lag_5*0.354822)+ (scaled_repayment_status_lag_6*0.0151421)+ (scaled_payment_amount_lag_1*0.0512893)+ (scaled_payment_amount_lag_2*0.000381215)+ (scaled_payment_amount_lag_3*0.13789)+ (scaled_payment_amount_lag_4*0.0193147)+ (scaled_payment_amount_lag_5*0.0986576)+ (scaled_payment_amount_lag_6*0.0923033));
y_1_3 = Logistic (1.98096+ (scaled_limit_balance*-0.268353)+ (scaled_gender*0.132924)+ (scaled_university*-0.158693)+ (scaled_graduate_school*-0.214737)+ (scaled_high_school*-0.273348)+ (scaled_others*0.559927)+ (scaled_married*0.0702353)+ (scaled_single*-0.0169948)+ (scaled_others_1*0.0913319)+ (scaled_repayment_status_lag_1*-1.64665)+ (scaled_repayment_status_lag_2*1.20179)+ (scaled_repayment_status_lag_3*0.865566)+ (scaled_repayment_status_lag_4*-0.733056)+ (scaled_repayment_status_lag_5*0.694502)+ (scaled_repayment_status_lag_6*-0.444317)+ (scaled_payment_amount_lag_1*0.402657)+ (scaled_payment_amount_lag_2*0.478144)+ (scaled_payment_amount_lag_3*0.998816)+ (scaled_payment_amount_lag_4*0.0466331)+ (scaled_payment_amount_lag_5*0.831399)+ (scaled_payment_amount_lag_6*0.261382));
non_probabilistic_default = Logistic (2.09934+ (y_1_1*-2.74891)+ (y_1_2*5.68963)+ (y_1_3*-3.45925));
default = probability(non_probabilistic_default);

logistic(x){
return 1/(1+exp(-x))
}

probability(x){
if x < 0
return 0
else if x > 1
return 1
else
return x
}
```

This expression can be exported elsewhere, for example, a bank's software to implement the prediction of these values.

## References:

• UCI Machine Learning Repository. Default of credit card clients data set.
• Yeh, I. C., & Lien, C. H. (2009). The comparisons of data mining techniques for the predictive accuracy of probability of default of credit card clients. Expert Systems with Applications, 36(2), 2473-2480.

## Related examples:

• Increasing the conversion rate of a telemarketing campaign in a bank.