The aim of this study is to predict if a person is going to donate blood by using a recency, frequency, monetary and time (RFMT) marketing model.

The database used for this study was taken from the donor database of Blood Transfusion Service Center in Hsin-Chu City in Taiwan.

Contents:

This is a classification project, since the variable to be predicted is binary (donate or not).

The goal here is to model the probability that a person donates blood, conditioned on his/her features.

The data file blood_donation.csv, contains the information used to create the model. It consists of 748 rows and 5 columns. The columns represent the variables and the rows represent the instances.

The next list describes the variables in the data set:

**recency**: Months since last donation.**frequency**: Total number of donations.**quantity**: Total blood donated.**time**: Months since first donation.**donation**: True if the person donated in the last campaign, false otherwise.

On the other hand, the total number of instances is 748. From that, we set 60% for training , 20% for selection and 20% for testing.

We can calculate the data distributions and plot a pie chart with the percentage of instances for each class.

As we can see, the number of negative responses is much greater than the number of positive responses.

Another relevant information to keep in mind, is the correlation of each of the inputs with the target variable. Below a chart with this information is displayed.

The second step is to choose a neural network to represent the classification function. For classification problems, it is composed by:

- A scaling layer.
- Two perceptron layers.
- A probabilistic layer.

For the scaling layer, the mean and standard deviation scaling method is set.

We set 2 perceptron layers, one hidden layer with 3 neurons as a first guess and one output layer with 1 neuron, both layers having the logistic activation function.

At last, we will set the continuous probabilistic method for the probabilistic layer.

The next figure is a diagram for the neural network used in this example.

The fourth step is to configure the training strategy, which is composed of two terms:

- Loss index.
- Optimization algorithm.

The loss index chosen is the weighted squared error with L2 regularization.

The chosen optimization algorithm is the quasi-Newton method. We leave the default training parameters, stopping criteria and training history settings.

The following chart shows how the training and selection error decrease with the epochs during the training process.
The final values are **training error = 0.695 WSE** and **selection error = 0.907 WSE**, respectively.

The objective of model selection is to find the network architecture with best generalization properties, that is, that which minimizes the error on the selection instances of the data set.

More specifically, we want to find a neural network with a selection error less than **0.907 WSE**,
which is the value that we have achieved so far.

Order selection algorithms train several network architectures with different number of neurons and select that with the smallest selection error.

The incremental order method starts with a small number of neurons and increases the complexity at each iteration. The following chart shows the training error (blue) and the selection error (orange) as a function of the number of neurons.

The final selection error achieved is **0.902** for an optimal number of neurons of 2.

The graph above represents the architecture of the final neural network.

The next step is to evaluate the performance of the trained neural network. The standard way to do this is to compare the outputs of the neural network against data never seen before, the training instances.

As standard testing method is to plot a ROC curve, which is a graphical illustration of how well the classifier discriminates between the two different classes. The output is shown in the next figure.

A random classifier has an area under curve 0.5 while a perfect classifier has an area under curve 1.
In practice, this measure should take a value between 0.5 and 1.
The closer to 1 area under curve, the better the classifier.
In this example, this parameter is **AUC = 0.804** which means a good performance.

The binary classification tests provide us with useful information about the performance of a binary classification model:

**Classification accuracy: 86.1%**(ratio of correctly classified samples).**Error rate: 13.9%**(ratio of misclassified samples).**Sensitivity: 84.2%**(percentage of actual positive classified as positive).**Specificity: 88.2%**(percentage of actual negative classified as negative).

The parameter classification accuracy takes a value of 0.861, which means that the prediction is good for most of the cases.

The confusion matrix contains the true positives, false positives, false negatives and true negatives for the diagnose:

Predicted positive | Predicted negative | |
---|---|---|

Real positive | 32 | 6 |

Real negative | 4 | 30 |

The number of correctly classified instances is 62, and the number of misclassified instances is 10.

The cumulative gain analysis is a visual aid that shows the advantage of using a predictive model opposed to randomness. It consists of three lines.

The baseline that represents the results that would be obtained without using a model.

The positive cumulative gain which shows in the y-axis the percentage of positive instances found against the percentage of population, which is represented in the x-axis.

Similarly, the negative cumulative gain shows the percentage of the negative instances found against the percentage of population.

Once the generalization performance of the neural network has been tested, it can be saved for future use in the so-called model deployment mode.

We can predict whether a person is going to donate blood by calculating the neural network outputs. For that, we need to set the input variables.

**recency**: 9 months since last donation.**frequency**: 5 number of donations.**time**: 34 months since first donation.**donation**: 30% probability.

The mathematical expression represented by the neural network is written below. It takes the inputs recency, frequency, monetary and time to produce the output prediction about donation. For classification problems, the information is propagated in a feed-forward fashion through the scaling layer, the perceptron layers and the probabilistic layer.

scaled_recency = (recency-9.50668)/8.0954; scaled_frequency = (frequency-5.51471)/5.83931; scaled_time = (time-34.2821)/24.3767; y_1_1 = Logistic (-3.2852+ (scaled_recency*-3.22375)+ (scaled_frequency*3.67502)+ (scaled_time*-2.45661)); y_1_2 = Logistic (-4.08721+ (scaled_recency*-2.96105)+ (scaled_frequency*2.76006)+ (scaled_time*-3.40265)); non_probabilistic_donation = Logistic (-1.089+ (y_1_1*5.14874)+ (y_1_2*-2.1466)); donation = probability(non_probabilistic_donation); logistic(x){ return 1/(1+exp(-x)) } probability(x){ if x < 0 return 0 else if x > 1 return 1 else return x }

The above expression can be exported anywhere.

- The data for this problem has been taken from the UCI Machine Learning Repository.
- Yeh, I-Cheng, Yang, King-Jang, and Ting, Tao-Ming, "Knowledge discovery on RFM model using Bernoulli sequence, "Expert Systems with Applications, 2008, https://dl.acm.org/citation.cfm?id=1498365