The aim of this study is to predict if a person is going to donate blood by using a recency, frequency, monetary and time (RFMT) marketing model.
The database used for this study was taken from the donor database of Blood Transfusion Service Center in Hsin-Chu City in Taiwan.
This is a classification project, since the variable to be predicted is binary (donate or not).
The goal here is to model the probability that a person donates blood, conditioned on his/her features.
The data file blood_donation.csv, contains the information used to create the model. It consists of 748 rows and 5 columns. The columns represent the variables and the rows represent the instances.
The next list describes the variables in the data set:
On the other hand, the total number of samples is 748. From that, we set 60% for training , 20% for selection and 20% for testing.
We can calculate the data distributions and plot a pie chart with the percentage of instances for each class.
As we can see, the number of negative responses is much greater than the number of postive responses.
The second step is to configure the neural network stuff. For classification problems, it is composed by:
The number of inputs, in this case, is 4 and the number of outputs is 1. The number of hidden perceptrons or complexity is 3, so this neural network can be denoted as 4:3:1.
The next figure is a graphical representation of the neural network used for this problem.
The fourth step is to configure the training strategy, which is composed of two terms:
The loss index chosen is the weighted squared error with L2 regularization.
The chosen optimization algorithm is the quasi-Newton method. We leave the default training parameters, stopping criteria and training history settings.
The following chart shows how the training and selection error decrease with the epochs during the training process. The final values are training error = 0.695 WSE and selection error = 0.907 WSE, respectively.
The objective of model selection is to find the network architecture with best generalization properties, that is, that which minimizes the error on the selection instances of the data set.
More specifically, we want to find a neural network with a selection error less than 0.907 WSE, which is the value that we have achieved so far.
Order selection algorithms train several network architectures with different number of neurons and select that with the smallest selection error.
The incremental order method starts with a small number of neurons and increases the complexity at each iteration. The following chart shows the training error (blue) and the selection error (orange) as a function of the number of neurons.
The next step is to evaluate the performance of the trained neural network. The standard way to do this is to compare the outputs of the neural network against data never seen before, the training instances.
As standard testing method is to plot a ROC curve, which is a graphical illustration of how well the classifier discriminates between the two different classes. The output is shown in the next figure.
A random classifier has an area under curve 0.5 while a perfect classifier has an area under curve 1. In practice, this measure should take a value between 0.5 and 1. The closer to 1 area under curve, the better the classifier. In this example, this parameter is AUC = 0.804 which means a good performance.
The binary classification tests provide us with useful information about the performance of a binary classification model:
The parameter classification accuracy takes a value of 0.861, which means that the prediction is good for most of the cases.
The confusion matrix contains the true positives, false positives, false negatives and true negatives for the diagnose:
|Predicted positive||Predicted negative|
The number of correctly classified instances is 62, and the number of misclassified instances is 10.
The cumulative gain analysis is a visual aid that shows the advantage of using a predictive model opposed to randomness. It consists of three lines. The baseline that represents the results that would be obtained without using a model. The positive cumulative gain which shows in the y-axis the percentage of positive instances found against the percentage of population, which is represented in the x-axis. Similarly, the negative cumulative gain shows the percentage of the negative instances found against the percentage of population.
Once the generalization performance of the neural network has been tested, it can be saved for future use in the so-called model deployment mode.
We can predict whether a client is going a person is going to donate blood by calculating the neural network outputs. For that, we need to set the input variables.
The mathematical expression represented by the neural network is written below. It takes the inputs recency, frequency, monetary and time to produce the output prediction about donation. For classification problems, the information is propagated in a feed-forward fashion through the scaling layer, the perceptron layers and the probabilistic layer. This expression can be exported anywhere.