Blood donation campaign
By Roberto Lopez, Artelnics.
The aim of this study is to predict if a person is going to donate blood by using a recency, frequency, monetary and time (RFMT) marketing model.
In this tutorial a classification application in marketing is solved by means of a neural network. The data for this problem has been taken from the UCI Machine Learning Repository.
The database used for this study was taken from the donor database of Blood Transfusion Service Center in Hsin-Chu City in Taiwan.
The data set, obtained form the data file blooddonation.dat, contains the information used to create the model. It consists of 748 rows and 5 columns. The columns represent the variables and the rows represent the instances. The values in the rows are separated by commas. The following listing is a preview of the data file.
The task "Report data set" shows a table with the name, units, description and use of all the variables in the data set.
The "Calculate target class distribution" task plots a pie chart with the percentage of instances for each class. In this data set, the number of negative responses is much greater than the number of postive responses.
In order to have a better predictive model, it is necessary that the targets distribution be more uniform. The task "Balance targets distribution" balances the targets distribution by unusing those instances whose variables belong to the most populated bins. After performing this task, the distribution will be more uniform and, in consequence, the model will be of better quality. There are 392 instances set as unused, 214 instances for training, 71 instances for selection and 71 instances for testing.
The second step is to configure the model stuff. For classification problems, it is composed by:
- Scaling layer.
- Learning layers.
- Probabilistic layer.
The following figure shows the neural network page in Neural Designer.
The number of inputs, in this case, is 4 and the number of outputs is 1. The number of hidden perceptrons or complexity is 3, so this neural network can be denoted as 4:3:1.
The next figure is a graphical representation of the neural network used for this problem.
The third step is to configure the loss index, which is composed of two terms:
- An error term.
- A regularization term.
The following figure shows the loss index tab in Neural Designer.
The objective term is to be the normalized squared error. It divides the squared error between the outputs from the neural network and the targets in the data set by a normalization coefficient. If the normalized squared error has a value of unity then the neural network is predicting the data 'in the mean', while a value of zero means perfect prediction of the data.
The fourth step is to set the training strategy. This learning process is applied to the neural network in order to get the best performance. The training strategy is composed of two algorithms:
- Initialization algorithm.
- Main algorithm.
We will not use any initialization algorithm here.The chosen main algorithm is the quasi-Newton method and we will leave the default training parameters, stopping criteria and training history settings.
The following chart shows how the performance decreases with the iterations during the training process. The initial value is 1.3559, and the final value after 22 iterations is 0.44294.
The next table shows the training results by the quasi-Newton method. They include some final states from the neural network. As we can see the final performance and final generalization performance are similar and the gradient norm is almost zero.
The last step is to evaluate the performance of the trained neural network. The standard way to do this is to compare the outputs of the neural network against data never seen before, the training instances.
The task "Calculate binary classification tests" provides us some useful information for testing the performance of a classification problem with two classes. The next figure shows the output of this task.
The parameter classification accuracy takes a value of 0.861, which means that the prediction is good for most of the cases.
The task "calculate confusion" depicts a table containing the data of a confusion matrix. The element (0,0) contains the true positives, the element (0,1) contains the false positives, the element (1,0) contains the false negatives, and the element (1,1) contains the true negatives for the variable diagnose. The number of correctly classified instances is 62, and the number of misclassified instances is 10.
So as to improve the quality of the model, it is useful to know which of the instances are classified as negative when they are positive and which of the instances are classified as positive when the are negative. The task "Calculate misclassified instances" provides us that information.
The next table shows the instances which are positive and are predicted as negative.
The next table shows the instances which are negative and are predicted as positive.
Finally, it can be plotted a ROC chart which is a graphical illustration of how well the classifier discriminates between the two different classes by using the task "Calculate ROC curve". The output is shown in the next figure.
A random classifier has an area under curve 0.5 while a perfect classifier has an area under curve 1. In practice, this measure should take a value between 0.5 and 1. The closer to 1 area under curve, the better the classifier. In this example, this parameter is 0.804 which means a good performance.
Once the generalization performance of the multilayer perceptron has been tested, the neural network can be saved for future use in the so called production mode.
We can predict wheter a client is going to buy the product by running the "Calculate outputs" tasks. For that we need to edit the input variables through the corresponding dialog.
Then the prediction is written in the viewer.
The mathematical expression represented by the neural network is written below. It takes the inputs recency, frequency, monetary and time to produce the output prediction about donation. For classification problems, the information is propagated in a feed-forward fashion through the scaling layer, the perceptron layers and the probabilistic layer. This expression can be exported anyware.