This example aims to detect fraudulent notes accurately.

For that, a set of images taken from genuine and forged banknote-like specimens is created. Features such as wavelet variance, wavelet skewness, wavelet kurtosis, and image entropy are extracted from the images.

The final accuracy obtained by this method is 100% on an independent testing set.

- Application type.
- Data set.
- Neural network.
- Training strategy.
- Model selection.
- Testing analysis.
- Model deployment.

This example is solved with Neural Designer. To follow it step by step, you can use the free trial.

This is a classification project, since the variable to be predicted is binary (fraudulent or legal).

The goal here is to model the probability that a banknote is fraudulent, as a function of its features.

The data file banknote_authentication.csv is the source of information for the classification problem. The number of instances (rows) in the data set is 1372, and the number of variables (columns) is 5.

In that way, this problem has the following variables:

**variance_of_wavelet_transformed**, used as input.**skewness_of_wavelet_transformed**, used as input.**curtosis_of_wavelet_transformed**, used as input.**entropy_of_image**, used as input.**counterfeit**, used as the target. It can only have two values: 0 (non-counterfeit) or 1 (counterfeit).

The instances are divided into training, selection, and testing subsets. There are 824 instances for training (60%), 274 instances for selection (20%), and 274 instances for testing (20%).

We can calculate the data distributions and plot a pie chart with the percentage of instances for each class.

As we can see, the number of authentic and forged banknotes are similar.

Next, we plot a scatter chart with the counterfeit, and the wavelet transformed variance data.

In general, the more wavelet transformed variance, the less probability of counterfeit.

The inputs-targets correlations might indicate which factors better discriminate between authentic and false banknotes.

From the above chart, we can see that the wavelet transformed variance might be the most influential variable for this application.

The second step is to configure a neural network to represent the classification function.

The next picture shows the neural network that defines the model.

The fourth step is to set the training strategy, which is composed of:

- Loss index.
- Optimization algorithm.

The loss index that we use is the weighted squared error with L2 regularization.

The learning problem can be stated as to find a neural network which minimizes the loss index. That is, we want a neural network that fits the data set (error term), and that does not oscillate (regularization term).

We use here the quasi-Newton method as the optimization algorithm. The default training parameters, stopping criteria, and training history settings are left.

The next figure shows the loss history with the quasi-Netwon method. As we can see, the loss decreases until it reaches a stationary value. This is a sign of convergence.

The final training and selection errors are almost zero, which means that the neural network fits the data very well.
More specifically, **training error = 0.014 WSE** and **selection error = 0.011 WSE**.

The objective of model selection is to improve the neural network's generalization capabilities or, in other words, to reduce the selection error.

Since the selection error we have achieved so far is minimal (0.011 WSE), there is no need to apply an order selection or an input selection.

The testing analysis aims to validate the generalization performance of the trained neural network.

A good measure for the precision of a binary classification model is the ROC curve.

The area under the curve of the model is
**AUC = 1**, which means that the classifier predicts well all the testing instances.

In the confusion matrix, the rows represent the target classes and the columns the output classes for the testing target data set. The diagonal cells in each table show the number of cases that were correctly classified, and the off-diagonal cells show the misclassified cases. The following table contains the elements of the confusion matrix.

Predicted positive | Predicted negative | |
---|---|---|

Real positive | 103 | 0 |

Real negative | 0 | 171 |

The number of correctly classified instances is 274, and the number of misclassified instances is 0. As there are not misclassified patterns, the model is predicting this testing data very well.

In the model deployment phase, the neural network is used to predict outputs for inputs that it has never seen.

For that, we can embed the mathematical expression represented by the neural network in the banknote authentication system. This expression is written below.

scaled_wavelet_transformed_variance = (wavelet_transformed_variance-0.433735)/2.84276; scaled_wavelet_transformed_skewness = (wavelet_transformed_skewness-1.92235)/5.86905; scaled_wavelet_transformed_curtosis = (wavelet_transformed_curtosis-1.39763)/4.31003; scaled_image_entropy = (image_entropy+1.19166)/2.10101; y_1_1 = Logistic (-2.95122+ (scaled_wavelet_transformed_variance*-3.20568)+ (scaled_wavelet_transformed_skewness*-4.57895) + (scaled_wavelet_transformed_curtosis*-5.83131)+ (scaled_image_entropy*0.125717)); y_1_2 = Logistic (3.23366+ (scaled_wavelet_transformed_variance*3.5863)+ (scaled_wavelet_transformed_skewness*2.36407) + (scaled_wavelet_transformed_curtosis*1.0865)+ (scaled_image_entropy*-1.0501)); non_probabilistic_counterfeit = Logistic (3.48838+ (y_1_1*9.72432)+ (y_1_2*-8.93277)); (counterfeit) = Probability(non_probabilistic_counterfeit); Logistic(x){ return 1/(1+exp(-x)) } Probability(x){ if x < 0 return 0 else if x > 1 return 1 else return x }

- Banknote authentication data set, UCI Machine Learning Repository.