
Credit risk management

By Fernando Gomez, Artelnics.
The objective of this example is to predict customers' default payments in a bank.
The main outcome of this project is to reduce loan losses, but real time scoring and limits monitoring are also achieved.
This example aims at the case of customers' default payments in a bank. From the perspective of risk management, the result of the predictive model of the probability of default will be more valuable than the binary result of classification  credible or not credible clients.
Contents:
1. Data set
The credit risk database used here is related with consumers' default payments in Taiwan. The data for this problem has been taken from the UCI Machine Learning Repository.
The data set, obtained form the data file creditriskmanagement.dat, contains the information used to create the model. It consists of 30000 rows and 25 columns. The columns represent the variables, the first row contains the names of the variables and the rest of the rows represent the instances. The values in the rows are separated by commas. The following listing is a preview of the data file. This research employed a binary variable, default payment (Yes = 1, No = 0), as the response variable. This study reviewed the literature and used the following 23 variables as explanatory variables:
LIMIT_BAL: Amount of the given credit (NT dollar): it includes both the individual consumer credit and his/her family (supplementary) credit. SEX: Gender (1 = male; 2 = female) EDUCATION: Education (1 = graduate school; 2 = university; 3 = high school; 4 = others). MARRIAGE: Marital status (1 = married; 2 = single; 3 = others). AGE: Age (year). PAY_1: Repayment status 1 month ago (1 = pay duly; 1 = payment delay for one month; ... ; 9 = payment delay for nine months and above). PAY_2: Repayment status 2 months ago (1 = pay duly; 1 = payment delay for one month; ... ; 9 = payment delay for nine months and above). PAY_3: Repayment status 3 months ago (1 = pay duly; 1 = payment delay for one month; ... ; 9 = payment delay for nine months and above). PAY_4: Repayment status 4 months ago (1 = pay duly; 1 = payment delay for one month; ... ; 9 = payment delay for nine months and above). PAY_5: Repayment status 5 months ago (1 = pay duly; 1 = payment delay for one month; ... ; 9 = payment delay for nine months and above). PAY_6: Repayment status 6 months ago (1 = pay duly; 1 = payment delay for one month; ... ; 9 = payment delay for nine months and above). BILL_AMT1: Amount of bill statement 1 month ago (NT dollar). BILL_AMT2: Amount of bill statement 2 months ago (NT dollar). BILL_AMT3: Amount of bill statement 3 months ago (NT dollar). BILL_AMT4: Amount of bill statement 4 months ago (NT dollar). BILL_AMT5: Amount of bill statement 5 months ago (NT dollar). BILL_AMT6: Amount of bill statement 6 months ago (NT dollar). PAY_AMT1: Amount paid 1 month ago (NT dollar). PAY_AMT2: Amount paid 2 month ago (NT dollar). PAY_AMT3: Amount paid 3 month ago (NT dollar). PAY_AMT4: Amount paid 4 month ago (NT dollar). PAY_AMT5: Amount paid 5 month ago (NT dollar). PAY_AMT6: Amount paid 6 month ago (NT dollar).
The next figure shows the data set tab in Neural Designer. It contains four sections:
 Data source.
 Variables.
 Instances.
 Missing values.
The "Calculate box plots" task plots a graph where we can see the information about the minimum, maximum, first quartile, second quartile or median and third quartile of every variable in the data set. The next figure shows the output of this task.
As we can see, for this variables, their maximum are far away from the rest of the data. This could be due to variability in the measurement or may indicate experimental errors.
This data are called outliers. Neural Designer can find and set as unused automatically the outliers with the task "Clean univariate outliers". On output a table with the unused instances for each variables is shown.
This table shows that 6964 instances has values far away from the rest of the cases.
2. Neural network
The second step is to configure the model stuff. For pattern recognition problems, it is composed by:
 Inputs.
 Scaling layer.
 Learning layers.
 Probabilistic layer.
 Outputs.
The following figure shows the neural network page in Neural Designer.
The number of inputs, in this case, is 28 and the number of outputs is 1. The number of hidden perceptrons or complexity is 7, so this neural network can be denoted as 28:7:1.
3. Loss index
UPDATE: The last version of the program now includes this section into Training Strategy.The third step is to configure the loss index, which is composed of two terms:
 An error term.
 A regularization term.
The error term is the weighted squared error. It weights the squared error of negatives and positives values. If the weighted squared error has a value of unity then the neural network is predicting the data 'in the mean', while a value of zero means perfect prediction of the data.
In this case, the neural parameters norm weight term is 0.01. This parameter makes the model to be stable, avoiding oscillations.
4. Training strategy
The fourth step is to set the training strategy. This learning process is applied to the neural network in order to get the best performance. The next figure shows the training strategy page in Neural Designer.
The chosen algorithm here is the quasiNewton method and we will leave the default training parameters, stopping criteria and training history settings.
The following chart shows how the performance decreases with the iterations during the training process. The initial value is 0.72117, and the final value after 172 iterations is 0.4027.
The next table shows the training results by the quasiNewton method. They include some final states from the neural network, the loss index and the training algorithm.
5. Testing analysis
The last step is to evaluate the performance of the trained neural network. The standard way to do this is to compare the outputs of the neural network against data never seen before, the training instances.
The task "Calculate binary classification tests" provides us some useful information for testing the performance of a pattern recognition problem with two classes. The next figure shows the output of this task.
The classification accuracy takes a high value, 74%, which means that the prediction is good for a large amount of the cases.
Another commonly used tasks in order to measure the performance are "Calculate ROC curve" and "Calculate cummulative gain". The first one is a graphical aid to study the capacity of discrimination of the classifier. One of the parameters that can be obtained from this chart is the area under the curve (AUC). The closer to 1 area under curve, the better the classifier. The next figure shows this measure for this example.
In this case, the AUC takes a high value: 0.772.
6. Model deployment
Once the generalization performance of the neural network has been tested, the neural network can be saved for future use in the so called production mode.
We can predict wheter a client is going to buy the product by running the "Calculate outputs" tasks. For that we need to edit the input variables through the corresponding dialog.
Then the prediction is written in the viewer.