Predict bankruptcy using machine learning

Bankruptcy is a legal proceeding involving a person or business that is unable to repay their outstanding debts. The aim of this example is to predict the bankruptcy from qualitative parameters from experts.

Risk assessment consists of identifying hazards and risk factors that may cause any possible harm to a business.

Contents:

  1. Application type.
  2. Data set.
  3. Neural network.
  4. Training strategy.
  5. Model selection.
  6. Testing analysis.
  7. Model deployment.

This example is solved with Neural Designer. In order to follow this example step by step, you can use the free trial.

1. Application type

This is a classification project, since the variable to predict is binary (bankruptcy or non-bankruptcy).

The goal here is to model the probability that a business goes bankrupt from different features.

2. Data set

The data set contains information to create our model. We need to configure three things:

The data file used for this example is bankruptcy-prevention.csv, which contains 7 features about 250 companies.

The data set includes the following variables:

On the other hand, the instances are divided randomly into training, selection and testing subsets, containing 60%, 20% and 20% of the instances, respectively.

Our target variable is class. We can calculate the data distributions and plot a pie chart with the percentage of instances for each class.

As we can see, the target variable is quite well balanced since 42.8% are bankrupt and 57.8% are not.

The inputs-targets correlations might indicate which factors have the greatest influence on going into bankruptcy.

In this example, all the variables have a negative correlation with the target variable, competitiveness is the variable with the highest correlation.

3. Neural network

The next step is to set the neural network parameters. For classification problems, it is composed of:

For the scaling layer, the mean and standard deviation scaling method has been set.

We set one perceptron layer with 3 neurons having the logistic activation function. This layer has 6 inputs and since the target variable is binary, only one output.

The neural network for this example can be represented with the following diagram:

4. Training strategy

The fourth step is to set the training strategy, which defines what the neural network will learn. A general training strategy for classification is composed of two terms:

The loss index chosen for this problem is the normalized squared error between the outputs from the neural network and the targets in the data set with L1 regularization.

The selected optimization algorithm is the Quasi-Newton method.

The following chart shows how the training and selection errors develop with the epochs during the training process. The final values are training error = 0.0104 and selection error = 0.045 NSE.

5. Model selection

The objective of model selection is to improve the neural network's generalization capabilities or, in other words, to reduce the selection error.

Since the selection error we have achieved so far is minimal (0.045 NSE), there is no need to apply an order selection or an input selection algorithm.

6. Testing analysis

The objective of the testing analysis is to validate the generalization performance of the trained neural network. To validate a classification technique, we need to compare the values provided by this technique to the observed values. We can use the ROC curve as it is the standard testing method for binary classification projects.

The AUC value for this example is 1.

The following table contains the elements of the confusion matrix. This matrix contains the true positives, false positives, false negatives and true negatives for the variable class.

Predicted positive Predicted negative
Real positive 19 (38%) 0 (0%)
Real negative 0 (0%) 31 (62%)

The total number of testing samples is 50, and all of them are correctly classified.

We can also perform the cumulative gain analysis which is a visual aid that shows the advantage of using a predictive model as opposed to randomness.

It consists of three lines. The baseline represents the results that would be obtained without using a model. The positive cumulative gain shows in the y-axis the percentage of positive instances found against the percentage of the population represented in the x-axis. Similarly, the negative cumulative gain shows the percentage of the negative instances found against the population percentage.

In this case, by using the model, we see that by analyzing 40% of the businesses with the higher probability of going bankrupt, we would reach the 100% of the companies that will go bankrupt.

7. Model deployment

The next listing shows the mathematical expression of the predictive model.

scaled_industrial_risk = industrial_risk*(1+1)/(1-(0))-0*(1+1)/(1-0)-1;
scaled_ management_risk =  management_risk*(1+1)/(1-(0))-0*(1+1)/(1-0)-1;
scaled_ financial_flexibility =  financial_flexibility*(1+1)/(1-(0))-0*(1+1)/(1-0)-1;
scaled_ credibility =  credibility*(1+1)/(1-(0))-0*(1+1)/(1-0)-1;
scaled_ competitiveness =  competitiveness*(1+1)/(1-(0))-0*(1+1)/(1-0)-1;
scaled_ operating_risk =  operating_risk*(1+1)/(1-(0))-0*(1+1)/(1-0)-1;

perceptron_layer_0_output_0 = sigma[ -0.824954 + (scaled_industrial_risk*-0.234849)+ (scaled_ management_risk*-0.228279)+ (scaled_ financial_flexibility*-0.679484)+ (scaled_ credibility*-1.28412)+ (scaled_ competitiveness*-1.933)+ (scaled_ operating_risk*0.174527) ];
perceptron_layer_0_output_1 = sigma[ 0.552724 + (scaled_industrial_risk*0.15642)+ (scaled_ management_risk*0.173794)+ (scaled_ financial_flexibility*0.493365)+ (scaled_ credibility*1.03215)+ (scaled_ competitiveness*1.46273)+ (scaled_ operating_risk*-0.0842097) ];
perceptron_layer_0_output_2 = sigma[ 0.793627 + (scaled_industrial_risk*0.190869)+ (scaled_ management_risk*0.153011)+ (scaled_ financial_flexibility*0.618457)+ (scaled_ credibility*1.33334)+ (scaled_ competitiveness*1.90228)+ (scaled_ operating_risk*-0.17836) ];

	probabilistic_layer_combinations_0 = 0.531175 +3.35846*perceptron_layer_0_output_0 -2.2378*perceptron_layer_0_output_1 -3.10438*perceptron_layer_0_output_2 
	
 class = 1.0/(1.0 + exp(-probabilistic_layer_combinations_0);

This formula can be also exported to the software tool required by the company.

References

Related examples:

Related solutions: