Machine learning examples

Target donors in blood donation campaigns

The aim of this study is to predict if a person is going to donate blood by using a recency, frequency, monetary and time (RFMT) marketing model.

The database used for this study was taken from the donor database of Blood Transfusion Service Center in Hsin-Chu City in Taiwan.



Contents:

  1. Application type.
  2. Data set.
  3. Neural network.
  4. Training strategy.
  5. Model selection.
  6. Testing analysis.
  7. Model deployment.

1. Application type

This is a classification project, since the variable to be predicted is binary (donate or not).

The goal here is to model the probability that a person donates blood, conditioned on his/her features.

2. Data set

The data file blood_donation.csv, contains the information used to create the model. It consists of 748 rows and 5 columns. The columns represent the variables and the rows represent the instances.

The next list describes the variables in the data set:

On the other hand, the total number of instances is 748. From that, we set 60% for training , 20% for selection and 20% for testing.

We can calculate the data distributions and plot a pie chart with the percentage of instances for each class.

As we can see, the number of negative responses is much greater than the number of positive responses.

Another relevant information to keep in mind, is the correlation of each of the inputs with the target variable. Below a chart with this information is displayed.

3. Neural network

The second step is to choose a neural network to represent the classification function. For classification problems, it is composed by:

For the scaling layer, the mean and standard deviation scaling method is set.

We set 2 perceptron layers, one hidden layer with 3 neurons as a first guess and one output layer with 1 neuron, both layers having the logistic activation function.

At last, we will set the continuous probabilistic method for the probabilistic layer.

The next figure is a diagram for the neural network used in this example.

4. Training strategy

The fourth step is to configure the training strategy, which is composed of two terms:

The loss index chosen is the weighted squared error with L2 regularization.

The chosen optimization algorithm is the quasi-Newton method. We leave the default training parameters, stopping criteria and training history settings.

The following chart shows how the training and selection error decrease with the epochs during the training process. The final values are training error = 0.695 WSE and selection error = 0.907 WSE, respectively.

5. Model selection

The objective of model selection is to find the network architecture with best generalization properties, that is, that which minimizes the error on the selection instances of the data set.

More specifically, we want to find a neural network with a selection error less than 0.907 WSE, which is the value that we have achieved so far.

Order selection algorithms train several network architectures with different number of neurons and select that with the smallest selection error.

The incremental order method starts with a small number of neurons and increases the complexity at each iteration. The following chart shows the training error (blue) and the selection error (orange) as a function of the number of neurons.

The final selection error achieved is 0.902 for an optimal number of neurons of 2.

The graph above represents the architecture of the final neural network.

6. Testing analysis

The next step is to evaluate the performance of the trained neural network. The standard way to do this is to compare the outputs of the neural network against data never seen before, the training instances.

As standard testing method is to plot a ROC curve, which is a graphical illustration of how well the classifier discriminates between the two different classes. The output is shown in the next figure.

A random classifier has an area under curve 0.5 while a perfect classifier has an area under curve 1. In practice, this measure should take a value between 0.5 and 1. The closer to 1 area under curve, the better the classifier. In this example, this parameter is AUC = 0.804 which means a good performance.

The binary classification tests provide us with useful information about the performance of a binary classification model:

The parameter classification accuracy takes a value of 0.861, which means that the prediction is good for most of the cases.

The confusion matrix contains the true positives, false positives, false negatives and true negatives for the diagnose:

Predicted positive Predicted negative
Real positive 32 6
Real negative 4 30

The number of correctly classified instances is 62, and the number of misclassified instances is 10.

The cumulative gain analysis is a visual aid that shows the advantage of using a predictive model opposed to randomness. It consists of three lines.

The baseline that represents the results that would be obtained without using a model.

The positive cumulative gain which shows in the y-axis the percentage of positive instances found against the percentage of population, which is represented in the x-axis.

Similarly, the negative cumulative gain shows the percentage of the negative instances found against the percentage of population.

7. Model deployment

Once the generalization performance of the neural network has been tested, it can be saved for future use in the so-called model deployment mode.

We can predict whether a person is going to donate blood by calculating the neural network outputs. For that, we need to set the input variables.

The mathematical expression represented by the neural network is written below. It takes the inputs recency, frequency, monetary and time to produce the output prediction about donation. For classification problems, the information is propagated in a feed-forward fashion through the scaling layer, the perceptron layers and the probabilistic layer.

scaled_recency = (recency-9.50668)/8.0954;
scaled_frequency = (frequency-5.51471)/5.83931;
scaled_time = (time-34.2821)/24.3767;
y_1_1 = Logistic (-3.2852+ (scaled_recency*-3.22375)+ (scaled_frequency*3.67502)+ (scaled_time*-2.45661));
y_1_2 = Logistic (-4.08721+ (scaled_recency*-2.96105)+ (scaled_frequency*2.76006)+ (scaled_time*-3.40265));
non_probabilistic_donation = Logistic (-1.089+ (y_1_1*5.14874)+ (y_1_2*-2.1466));
donation = probability(non_probabilistic_donation);

logistic(x){
   return 1/(1+exp(-x))
}

probability(x){
   if x < 0
       return 0
   else if x > 1
       return 1
   else
       return x
}
        

The above expression can be exported anywhere.

References:

Related examples:

Related solutions: