Bank marketing logo

Bank marketing

By Roberto Lopez, Artelnics.

In this tutorial a classification application in marketing is solved by means of a neural network.

Telemarketing is an interactive technique of direct marketing via the phone which is widely used by banks in order to sell long-term deposits. Although direct marketing can be extremely powerful at generating sales, the vast number of marketing campaigns has reduced the effect on the general public. The aim of this study is to predict whether a client is going to subscribe a long-term deposit or not.

Bank marketing logo
Telemarketing.

Contents:

  1. Data set
  2. Neural network
  3. Loss index
  4. Training strategy
  5. Testing analysis
  6. Model deployment

1. Data set

The bank telemarketing database used here is related with direct marketing campaigns of a Portuguese bank institution. The data for this problem has been taken from the UCI Machine Learning Repository.

The data set, obtained from the data file bankmarketing.dat, contains the information used to create the model. It consists of 1522 rows and 19 columns. The columns represent the variables, the first row contains the names of the variables and the rest of the rows represent the instances. The values in the rows are separated by commas. The following listing is a preview of the data file.

Bank marketing dataset picture
Bank marketing data set.

The next figure shows the data set tab in Neural Designer. It contains four sections:

  1. Data file.
  2. Variables information.
  3. Instances information.
  4. Missing values information

Data set page screenshot
Data set page.

The variables are:

  1. age: Age.
  2. married: Marital status.
  3. single: Marital status.
  4. divorced: Marital status.
  5. education: Type of education (primary, secondary, tertiary).
  6. default:Takes value 1 if the client has a credit in default and 0 in other case.
  7. balance:Account balance.
  8. housing:Takes value 1 if the client has a housing loan and 0 in other case.
  9. loan:Takes value 1 if the client has a personal loan and 0 in other case.
  10. contact:Contact communication type (cellular, telephone).
  11. day:Last contact day of the week.
  12. month:Last contact month of the year.
  13. duration:Last contact duration.
  14. campaign:Number of contacts performed during this campaign and for this client.
  15. pdays:Number of days that passed by after the client was last contacted from a previous campaign.
  16. previous:Number of contacts performed before this campaign and for this client.
  17. poutcome:Outcome of the previous marketing campaign.
  18. y:Takes value 1 if the client has subscribed a term deposit and 0 in other case, used as target.

The "Calculate target class distribution" task plots a pie chart with the percentage of instances for each class. In this data set, the number of negative responses is much greater than the number of postive responses.

Target variable distribution
Target class distribution.

In order to have a better predictive model, it is necessary that the targets distribution is more uniform. The task "Balance targets distribution" balances the targets distribution by unusing those instances whose variables belong to the most populated bins. After performing this task, the distribution will be more uniform and, in consequence, the model will be of better quality. There are 3479 instances set as unused, 634 instances for training, 196 instances for generalization and 212 instances for testing.

Instances distribution
Instances distribution.

2. Neural network

The second step is to configure the model stuff. For classification problems, it is composed by:

  • Inputs.
  • Scaling layer.
  • Learning layers.
  • Probabilistic layer.
  • Outputs.

The following figure shows the neural network page in Neural Designer.

Target variable distribution
Neural network page.

The number of inputs, in this case, is 18 and the number of outputs is 1. The number of hidden perceptrons or complexity is 3, so this neural network can be denoted as 18:3:1.

The next figure is a graphical representation of the neural network used for this problem.

Target variable distribution
Neural network graph.

3. Loss index

The third step is to configure the loss index, which is composed of two terms:

  1. An error term.
  2. A regularization term.

The following figure shows the loss index tab in Neural Designer.

Target variable distribution
Loss index page.

The error term is the weighted squared error. It weights the squared error of negatives and positives values. If the weighted squared error has a value of unity then the neural network is predicting the data 'in the mean', while a value of zero means perfect prediction of the data.

In this case, the neural parameters norm weight term is 0.01. This parameter makes the model to be stable, avoiding oscillations.

4. Training strategy

The fourth step is to set the training strategy. This learning process is applied to the neural network in order to get the best loss. The next figure shows the training strategy page in Neural Designer.

Training strategy
Training strategy page.

The chosen algorithm here is the quasi-Newton method and we will leave the default training parameters, stopping criteria and training history settings.

The following chart shows how the loss decreases with the iterations during the training process. The initial value is 0.70539, and the final value after 205 iterations is 0.33569.

Loss index history
Loss index history.

The next table shows the training results by the quasi-Newton method. They include some final states from the neural network, the loss index and the training algorithm. The training time was 2 seconds.

Training results
Training results.

5. Testing analysis

The last step is to evaluate the performance of the trained neural network. The standard way to do this is to compare the outputs of the neural network against data never seen before, the training instances.

The task "Calculate binary classification tests" provides us some useful information for testing the performance of a classification problem with two classes. The next figure shows the output of this task.

Binary classification tests table
Binary classification tests.

The classification accuracy takes a high value, 78,8%, which means that the prediction is good for a large amount of cases.

Another commonly used tasks in order to measure the performance are "Calculate ROC curve" and "Calculate cummulative gain". The first one is a graphical aid to study the capacity of discrimination of the classifier. One of the parameters that can be obtained from this chart is the area under the curve (AUC). The closer to 1 area under curve, the better the classifier. The next figure shows this measure for this example.

Area under curve
Area under curve.

In this case, the AUC takes a high value: 0.801.

The second one is another graphical aid that shows the advantage of using a predictive model against randomness. The next picture depicts the cumulative gain for the current example.

Cumulative gain plot
Cumulative gain.

As we can see, this chart shows that by calling only the half of the clients, we can achieve more than the 80% of the positive responses.

The "calculate conversion rates" task will show us the following chart.

Conversion rates graph
Conversion rates.

6. Model deployment

Once the generalization performance of the neural network has been tested, the neural network can be saved for future use in the so called production mode.

We can predict wheter a client is going to buy the product by running the "Calculate outputs" tasks. For that we need to edit the input variables through the corresponding dialog.

Inputs dialog
Inputs dialog.

Then the prediction is written in the viewer.

Prediction dialog
Variable y prediction value.

The mathematical expression represented by the neural network is written below. It takes the inputs age, job, married, single, divorced, education, default, balance, housing, loan, contact, day, month, duration, campaign, pdays, previous and poutcome to produce the output prediction. For classification problems, the information is propagated in a feed-forward fashion through the scaling layer, the perceptron layers and the probabilistic layer. This expression can be exported anyware.

				scaled_age=2*(age-19)/(87-19)-1;
				scaled_job=2*(job-0)/(2-0)-1;
				scaled_married=2*(married-0)/(1-0)-1;
				scaled_single=2*(single-0)/(1-0)-1;
				scaled_divorced=2*(divorced-0)/(1-0)-1;
				scaled_education=2*(education-1)/(3-1)-1;
				scaled_default=2*(default-0)/(1-0)-1;
				scaled_balance=2*(balance+3313)/(71188+3313)-1;
				scaled_housing=2*(housing-0)/(1-0)-1;
				scaled_loan=2*(loan-0)/(1-0)-1;
				scaled_contact_type=2*(contact_type-0)/(1-0)-1;
				scaled_day=2*(day-1)/(31-1)-1;
				scaled_month=2*(month-1)/(12-1)-1;
				scaled_duration=2*(duration-5)/(2769-5)-1;
				scaled_campaign_contacts=2*(campaign_contacts-1)/(28-1)-1;
				scaled_last_contact=2*(last_contact-1)/(871-1)-1;
				scaled_previous_contacts=2*(previous_contacts-0)/(25-0)-1;
				scaled_previous_conversion=2*(previous_conversion-0)/(1-0)-1;
				y_1_1=Logistic(-4.79838
				+0.303362*scaled_age
				+1.66482*scaled_job
				+3.42973*scaled_married
				-0.82212*scaled_single
				+2.28614*scaled_divorced
				+1.86739*scaled_education
				-0.345059*scaled_default
				+5.6351*scaled_balance
				-2.87765*scaled_housing
				+5.13143*scaled_loan
				+2.3995*scaled_contact_type
				-3.17024*scaled_day
				+2.12164*scaled_month
				-7.73023*scaled_duration
				-3.07687*scaled_campaign_contacts
				-2.95386*scaled_last_contact
				+5.46041*scaled_previous_contacts
				-4.27363*scaled_previous_conversion);
				y_1_2=Logistic(-0.740143
				+1.37225*scaled_age
				+0.250253*scaled_job
				-0.278685*scaled_married
				+0.623644*scaled_single
				+0.628393*scaled_divorced
				+1.40513*scaled_education
				-1.90374*scaled_default
				-5.31453*scaled_balance
				-0.430993*scaled_housing
				-1.03251*scaled_loan
				-0.477979*scaled_contact_type
				-1.32581*scaled_day
				+0.161253*scaled_month
				+10.9283*scaled_duration
				-4.34163*scaled_campaign_contacts
				-0.772935*scaled_last_contact
				-0.529185*scaled_previous_contacts
				+0.970718*scaled_previous_conversion);
				y_1_3=Logistic(9.95505
				+3.77202*scaled_age
				-1.69476*scaled_job
				-6.68099*scaled_married
				-2.18035*scaled_single
				-1.27475*scaled_divorced
				+3.32785*scaled_education
				-1.18525*scaled_default
				+6.24294*scaled_balance
				+1.59697*scaled_housing
				+0.169284*scaled_loan
				-0.442897*scaled_contact_type
				-4.88145*scaled_day
				-1.42147*scaled_month
				-0.140229*scaled_duration
				+5.26704*scaled_campaign_contacts
				-0.974279*scaled_last_contact
				+9.51053*scaled_previous_contacts
				-8.43848*scaled_previous_conversion);
				non_probabilistic_conversion=Logistic(-4.24532
				-12.7807*y_1_1
				+22.5752*y_1_2
				-17.1506*y_1_3);
				(conversion) = Probability(non_probabilistic_conversion);

				Logistic(x){
					return 1/(1+exp(-x))
				}

				Probability(x){
					if x < 0
						return 0
					else if x > 1
						return 1
					else
						return x
				}