Predict the probability of default of credit card clients

End-to-end machine learning examples

The objective of this example is to predict customer's default payments in a bank. The main outcome of this project is to reduce loan losses, but real time scoring and limits monitoring are also achieved.

This example aims at the case of customer's default payments in a bank. From the perspective of risk management, the result of the predictive model of the probability of default will be more valuable than the binary result of classification - credible or not credible clients.

The credit risk database used here is related with consumers' default payments in Taiwan.


  1. Application type
  2. Data set
  3. Neural network
  4. Training strategy
  5. Model selection
  6. Testing analysis
  7. Model deployment

1. Application type

This is a classification project, since the variable to be predicted is binary (default or not).

The goal here is to model the probability of default, as a function of the customer features.

2. Data set

The data set is is composed of four concepts:

The data file credit_risk.csv, contains the information used to create the model. It consists of 30000 rows and 25 columns. The columns represent the variables, the rows represent the instances.

This data set uses the following 23 variables:

On the other hand, the samples are divided at random into training, validation and testing subsets, containing 60%, 20% and 20% of the instances, respectively. More specifically, 18000 samples are used here for training, 6000 for selection and 6000 for testing.

Once the data set is configured, we can calculate the data distribution of the variables. The next figure depicts the numbers of customers that do and do not repay the loan.

The next figure depicts the correlations of all the inputs with the target. This helps us to see the influence of the different inputs on the default.

3. Neural network

The next step is to configure the neural network stuff. For pattern recognition problems, it is usually composed by:

The number of inputs, in this case, is 28 and the number of outputs is 1. The number of perceptron layers is 2.

4. Training strategy

The fourth step is to configure the training strategy, which is composed of two concepts:

The error term is the weighted squared error. It weights the squared error of negatives and positives values. If the weighted squared error has a value of unity then the neural network is predicting the data 'in the mean', while a value of zero means perfect prediction of the data.

In this case, the neural parameters norm weight term is 0.01. This parameter makes the model to be stable, avoiding oscillations.

The optimization algorithm is applied to the neural network to get the best performance. The chosen algorithm here is the quasi-Newton method and we leave the default parameters.

The following chart shows how the training and selection errors decrease with the epochs during the training process. The final results are training error = 0.755 WSE and selection error = 0.802 WSE, respectively.

5. Model selection

The objective of model selection is to find the network architecture with best generalization properties, that is, that which minimizes the error on the selection instances of the data set.

More specifically, we want to find a neural network with a selection error less than 0.802 WSE, which is the value that we have achieved so far.

Order selection algorithms train several network architectures with different number of neurons and select that with the smallest selection error.

The incremental order method starts with a small number of neurons and increases the complexity at each iteration. The following chart shows the training error (blue) and the selection error (orange) as a function of the number of neurons.

6. Testing analysis

The next step is to evaluate the performance of the trained neural network by an exhaustive testing analysis. The standard way to do this is to compare the outputs of the neural network against data never seen before, the training instances.

A common method to measure the generalization performance is the ROC curve. This is a graphical aid to study the capacity of discrimination of the classifier. One of the parameters that can be obtained from this chart is the area under the curve (AUC). The closer to 1 area under curve, the better the classifier. The next figure shows this measure for this example.

In this case, the AUC takes a high value: AUC = 0.772.

The binary classification tests give us very useful information about the performance of our predictive model:

The classification accuracy takes a high value (76.2%), which means that the prediction is good for a large amount of the cases.

7. Model deployment

Once the generalization performance of the neural network has been tested, the neural network can be saved for future use in the so-called model deployment mode.

We can predict the probability of default of a new client by calculating the neural network outputs. For that we need to set the input variables for that customer.


Related examples:

Related solutions: