In this tutorial a classification application in marketing is solved by means of a neural network.
Telemarketing is an interactive technique of direct marketing via the phone which is widely used by banks in order to sell long-term deposits. Although direct marketing can be extremely powerful at generating sales, the vast number of marketing campaigns has reduced the effect on the general public. The aim of this study is to predict whether a client is going to subscribe a long-term deposit or not.
The bank telemarketing database used here is related with direct marketing campaigns of a Portuguese bank institution. The data for this problem has been taken from the UCI Machine Learning Repository.
The data set, obtained from the data file bankmarketing.dat, contains the information used to create the model. It consists of 1522 rows and 19 columns. The columns represent the variables, the first row contains the names of the variables and the rest of the rows represent the instances. The values in the rows are separated by commas. The following listing is a preview of the data file.
The next figure shows the data set tab in Neural Designer. It contains four sections:
The variables are:
The "Calculate target class distribution" task plots a pie chart with the percentage of instances for each class. In this data set, the number of negative responses is much greater than the number of postive responses.
In order to have a better predictive model, it is necessary that the targets distribution is more uniform. The task "Balance targets distribution" balances the targets distribution by unusing those instances whose variables belong to the most populated bins. After performing this task, the distribution will be more uniform and, in consequence, the model will be of better quality. There are 3479 instances set as unused, 634 instances for training, 196 instances for generalization and 212 instances for testing.
The second step is to configure the model stuff. For classification problems, it is composed by:
The following figure shows the neural network page in Neural Designer.
The number of inputs, in this case, is 18 and the number of outputs is 1. The number of hidden perceptrons or complexity is 3, so this neural network can be denoted as 18:3:1.
The next figure is a graphical representation of the neural network used for this problem.
The third step is to configure the loss index, which is composed of two terms:
The error term is the weighted squared error. It weights the squared error of negatives and positives values. If the weighted squared error has a value of unity then the neural network is predicting the data 'in the mean', while a value of zero means perfect prediction of the data.
In this case, the neural parameters norm weight term is 0.01. This parameter makes the model to be stable, avoiding oscillations.
The fourth step is to set the training strategy. This learning process is applied to the neural network in order to get the best loss. The next figure shows the training strategy page in Neural Designer.
The chosen algorithm here is the quasi-Newton method and we will leave the default training parameters, stopping criteria and training history settings.
The following chart shows how the loss decreases with the iterations during the training process. The initial value is 0.70539, and the final value after 205 iterations is 0.33569.
The next table shows the training results by the quasi-Newton method. They include some final states from the neural network, the loss index and the training algorithm. The training time was 2 seconds.
The last step is to evaluate the performance of the trained neural network. The standard way to do this is to compare the outputs of the neural network against data never seen before, the training instances.
The task "Calculate binary classification tests" provides us some useful information for testing the performance of a classification problem with two classes. The next figure shows the output of this task.
The classification accuracy takes a high value, 78,8%, which means that the prediction is good for a large amount of cases.
Another commonly used tasks in order to measure the performance are "Calculate ROC curve" and "Calculate cummulative gain". The first one is a graphical aid to study the capacity of discrimination of the classifier. One of the parameters that can be obtained from this chart is the area under the curve (AUC). The closer to 1 area under curve, the better the classifier. The next figure shows this measure for this example.
In this case, the AUC takes a high value: 0.801.
The second one is another graphical aid that shows the advantage of using a predictive model against randomness. The next picture depicts the cumulative gain for the current example.
As we can see, this chart shows that by calling only the half of the clients, we can achieve more than the 80% of the positive responses.
The "calculate conversion rates" task will show us the following chart.
Once the generalization performance of the neural network has been tested, the neural network can be saved for future use in the so called production mode.
We can predict wheter a client is going to buy the product by running the "Calculate outputs" tasks. For that we need to edit the input variables through the corresponding dialog.
Then the prediction is written in the viewer.
The mathematical expression represented by the neural network is written below. It takes the inputs age, job, married, single, divorced, education, default, balance, housing, loan, contact, day, month, duration, campaign, pdays, previous and poutcome to produce the output prediction. For classification problems, the information is propagated in a feed-forward fashion through the scaling layer, the perceptron layers and the probabilistic layer. This expression can be exported anyware.
scaled_age=2*(age-19)/(87-19)-1; scaled_job=2*(job-0)/(2-0)-1; scaled_married=2*(married-0)/(1-0)-1; scaled_single=2*(single-0)/(1-0)-1;