
Banknote authentication

By Sergio Sanchez, Artelnics.
Everyday millions of people use banknotes to make transactions. The security of these banknotes for governments and banks is hence an essential factor in order to fight froud.
Nowadays, sometimes it is too hard to spot counterfeit and genuine notes. Then, the aim of this study is to use advanced analytics in order to create a support system ready to help organizations to accurately classify fraudulent notes.
Contents:
1. Data set
The first step is to prepare the data file, which is the source of information for the classification problem.
The format of this file is a set of rows with values separated by tabs. In classification, target variables can only have two values: 0 (false) or 1 (true).
The following listing is a preview of the data file. The number of instances (rows) in the data set is 1372, and the number of variables (columns) is 5.
The next figure shows the data set tab in Neural Designer.
In that way, this problem has the following variables:
 variance_of_wavelet_transformed, used as input.
 skewness_of_wavelet_transformed, used as input.
 curtosis_of_wavelet_transformed, used as input.
 entropy_of_image, used as input.
 class, used as target.
The data set is divided into training, selection and testing subsets. There are 824 instances for training (60.1%), 274 instances for selection (20%), 274 instances for testing (20%) and 0 unused instances (0%).
2. Neural network
The second step is to choose a neural network architecture to represent the classification function. Depending on the number of inputs, neurons in the hidden layers, and outputs, the architecture and the result of the neural network will be different. The next picture shows the neural network that defines the model mentioned below.
The scaling layer section contains information about the method for scaling the input variables and the statistic values to be used by that method. In this example, we will use the minimum and maximum method for scaling the inputs. The mean and standard deviation would also be appropriate here.
3. Loss index
UPDATE: The last version of the program now includes this section into Training Strategy.The third step is to set the loss index, which is composed by:
 Error term.
 Regularization term.
The objective term is the weighted squared error.
On the other hand, the regularization term is the neural parameters norm. The weight for this term is 0.001. Regularization has two effects here:
 It makes the model to be stable, without oscillations.
 It avoids saturation of the logistic activation functions.
The learning problem can be stated as to find a neural network which minimizes the loss index, i.e., a neural network that fits the data set (objective) and that does not oscillate (regularization).
4. Training strategy
The third step in solving this problem is to assign the training strategy. A general training strategy is composed of two algorithms:
 Initialization algorithm.
 Main algorithm.
The next figure shows the training strategy page in Neural Designer.
We will not use any initialization algorithm here. We use here the quasiNewton method as the main training algorithm. We will leave the default training parameters, stopping criteria and training history settings.
The next figure shows the loss history with the quasiNetwon method. As we can see, the loss decreases until it reaches a stationary value. This is a sign of convergence.
The neural network is trained in order to achieve the best possible loss and good generalization properties. The next table shows the final states from the neural network, the loss index and the training algorithm.
The final loss is almost zero, which means that the neural network fits the data very well. The selection loss is also very small, which certificates that no over fitting has occurred.
5. Testing analysis
The last step is to test the generalization performance of the trained neural network. In the confusion matrix the rows represent the target classes and the columns the output classes for the testing target data set. The diagonal cells in each table show the number of cases that were correctly classified, and the offdiagonal cells show the misclassified cases.
The following table contains the elements of the confusion matrix.
The number of correctly classified instances is 274, and the number of misclassified instances is 0. As there are not misclassified patterns, the model is predicting this testing data very well.
6. Model deployment
The neural network is now ready to predict outputs for inputs that it has never seen. The "Calculate output" task calculates the output value for a given input value. This task opens a dialog to set the input values, see the next figure.
The mathematical expression represented by the neural network is written below.
scaled_variance=2*(variance+7.0421)/(6.8248+7.0421)1; scaled_skewness=2*(skewness+13.7731)/(12.9516+13.7731)1; scaled_kurtosis=2*(kurtosis+5.2861)/(17.9274+5.2861)1; scaled_entropy=2*(entropy+8.5482)/(2.4495+8.5482)1; y_1_1=Logistic(3.28052 +5.03143*scaled_variance 0.0363754*scaled_skewness 3.85087*scaled_kurtosis +0.00528292*scaled_entropy); y_1_2=Logistic(0.743815 0.699382*scaled_variance 2.24925*scaled_skewness 0.859183*scaled_kurtosis 0.127586*scaled_entropy); y_1_3=Logistic(1.97779 +6.38392*scaled_variance +3.23236*scaled_skewness +3.52077*scaled_kurtosis 1.19071*scaled_entropy); y_1_4=Logistic(1.38079 2.13277*scaled_variance 1.99074*scaled_skewness +0.790205*scaled_kurtosis 1.00866*scaled_entropy); y_1_5=Logistic(2.53669 +5.82326*scaled_variance +5.22523*scaled_skewness +6.12734*scaled_kurtosis 0.686447*scaled_entropy); non_probabilistic_class=Logistic(4.87466 +8.67167*y_1_1 +2.83798*y_1_2 7.84766*y_1_3 +3.72746*y_1_4 11.2728*y_1_5); (class) = Probability(non_probabilistic_class); Logistic(x){ return 1/(1+exp(x)) } Probability(x){ if x < 0 return 0 else if x > 1 return 1 else return x }