Increase the conversion rate of telemarketing campaigns in banks

End-to-end machine learning examples

Telemarketing is an interactive technique of direct marketing via the phone which is widely used by banks to sell long-term deposits.

Although direct marketing can be extremely powerful at generating sales, the vast number of marketing campaigns has reduced the effect on the general public.

The aim of this study is to predict whether a client is going to subscribe a long-term deposit or not.

The bank telemarketing database used here is related with direct marketing campaigns of a Portuguese bank institution.

Contents:

  1. Application type
  2. Data set
  3. Neural network
  4. Training strategy
  5. Model selection
  6. Testing analysis
  7. Model deployment

1. Application type

This is a classification project, since the variable to be predicted is binary (buy or not buy).

The goal here is to model the probability of buying, as a function of the customer features.

2. Data set

In general, a data set contains the following concepts:

The data file bank_marketing.csv contains the information used to create the model. It consists of 1522 rows and 19 columns. Each row represents a different customer, and each column a different feature for each customer.

The variables are:

There are 1522 instances in the data set. 60% of them are used for training, 20% for selection and 20% for testing.

We can calculate the data distribution to see the percentage of instances for each class.

Here, the number of calls without conversion much greater than the number of calls with conversion, as expected.

We can also calculate the inputs-targets-correlations between the conversion rate and all the customer features to see which variables might have more influence in the buying process.

3. Neural network

The second step is to configure the neural network stuff. For classification problems, it is composed by:

The next figure is a graphical representation of the neural network used for this problem.

4. Training strategy

The fourth step is to configure the training strategy, which is composed of two concepts:

The loss index chosen is the weighted squared error with L2 regularization.

The optimization algorithm is applied to the neural network to get the minimum loss.

The chosen algorithm here is the quasi-Newton method. We leave the default training parameters, stopping criteria and training history settings.

The following chart shows how the training and selection errors decrease with the epochs during the training process. The final values are training error = 0.821 WSE and selection error = 0.889 WSE, respectively.

5. Model selection

The objective of model selection is to find the network architecture with best generalization properties, that is, that which minimizes the error on the selection instances of the data set.

More specifically, we want to find a neural network with a selection error less than 0.889 WSE, which is the value that we have achieved so far.

Order selection algorithms train several network architectures with different number of neurons and select that with the smallest selection error.

The incremental order method starts with a small number of neurons and increases the complexity at each iteration. The following chart shows the training error (blue) and the selection error (orange) as a function of the number of neurons.

6. Testing analysis

The objective of testing analysis is to evaluate the generalization performance of the neural network. The standard way to do this is to compare the outputs of the neural network against data that it has never seen before, the testing instances.

A commonly used method to test a neural network is the ROC curve.

One of the parameters that can be obtained from this chart is the area under the curve (AUC). The closer to 1 area under curve, the better the classifier. In this case, the area under the curve takes a high value: AUC = 0.80.

The binary classification tests provide us with useful information for testing the performance of a binary classification problem:

The classification accuracy takes a high value, which means that the prediction is good for a large amount of cases.

The second one is another graphical aid that shows the advantage of using a predictive model against randomness. The next picture depicts the cumulative gain for the current example.

As we can see, this chart shows that by calling only the half of the clients, we can achieve more than the 80% of the positive responses.

The conversion rates for this problem are depicted in the following chart.

7. Model deployment

Once the generalization performance of the neural network has been tested, the neural network can be saved for future use in the so-called model deployment mode.

We can predict which clients have more probability of buying the product by calculating the neural network outputs. For that we need to know the input variables for each new client.

The mathematical expression represented by the neural network is written below. It takes all the features of a customer to produce the output prediction. For classification problems, the information is propagated in a feed-forward fashion through the scaling layer, the perceptron layers and the probabilistic layer. This expression can be exported anywhere.

References:

Related examples:

Related solutions: