Predict credit card frauds using machine learning

Credit card fraud is when someone uses your credit card or credit account to make a purchase you didn't authorize. This activity can happen in different ways: If you lose your credit card or have it stolen, it can be used to make purchases or other payments, either in person or online.

In this example we will classify payments from a credit card as fraudulent or not fraudulent depending on different variables.

Contents:

  1. Application type.
  2. Data set.
  3. Neural network.
  4. Training strategy.
  5. Model selection.
  6. Testing analysis.
  7. Model deployment.
  8. Tutorial video.

This example is solved with Neural Designer. In order to follow this example step by step, you can use the free trial.

1. Application type

This is a classification project, since the variable to predict is binary (fraudulent or not fraudulent).

The goal here is to create a model to obtain the likelihood of a transaction being fraudulent.

2. Data set

The data set contains information to create our model. We need to configure three things:

The data file used for this example is creditcard-fraud.csv, which contains 11 features about 3075 payments.

The data set includes the following variables:

On the other hand, the instances are divided randomly into training, selection and testing subsets, containing 60%, 20% and 20% of the instances, respectively.

Our target variable is is_fradulent. We can calculate the data distributions and plot a pie chart with the percentage of instances for each class.

As we can see, the target variable is very unbalanced since there are many payments that are not fraudulent, around 85%, while only 15% are fraudulent, we could say that around 1 out of 6 payments are fraudulent.

The inputs-targets correlations might indicate which factors have the greatest influence on a transaction being fraudulent.

In this example, all of the variables have a positive correlation except from is_declined. Moreover, the variable high_risk_country has a highest correlation with the target variable.

3. Neural network

The next step is to set the neural network parameters. For classification problems, it is composed of:

For the scaling layer, the mean and standard deviation scaling method has been set.

We set one perceptron layer with 3 neurons having the logistic activation function. This layer has 9 inputs and since the target variable is binary, only one output.

The neural network for this example can be represented with the following diagram:

4. Training strategy

The fourth step is to set the training strategy, which defines what the neural network will learn. A general training strategy for classification is composed of two terms:

The loss index chosen for this problem is the normalized squared error between the outputs from the neural network and the targets in the data set with L1 regularization.

The selected optimization algorithm is the Quasi-Newton method. The selected optimization algorithm is the Quasi-Newton method.

The following chart shows how the training and selection errors develop with the epochs during the training process. The final values are training error = 0.0522 NSE and selection error = 0.103 NSE.

5. Model selection

The objective of model selection is to find the network architecture with the best generalization properties, this means, the one that minimizes the error on the selected instances of the data set.

More specifically, we want to find a neural network with a selection error smaller than 0.103 NSE, which is the value that we have achieved so far.

Order selection algorithms train several network architectures with a different number of neurons and select the one with the smallest selection error.

The incremental order method starts with a small number of neurons and increases the complexity at each iteration. The following chart shows the training error (blue) and the selection error (orange) as a function of the number of neurons.

The selection errors achieved are similar for any number of variables, however the smallest is 0.1007 for an optimal number of neurons of 4.

The graph above represents the architecture of the final neural network.

6. Testing analysis

The objective of the testing analysis is to validate the generalization performance of the trained neural network. To validate a classification technique, we need to compare the values provided by this technique to the observed values. We can use the ROC curve as it is the standard testing method for binary classification projects.

The AUC value for this example is 0.9982.

The following table contains the elements of the confusion matrix. This matrix contains the true positives, false positives, false negatives and true negatives for the variable is_fraudulent.

Predicted positive Predicted negative
Real positive 79 (12%) 5 (0%)
Real negative 8 (1%) 523 (85%)

The total number of testing samples is 615. The number of correctly classified samples is 602 (97%) and the number of misclassified samples is 13 (2%).

The binary classification tests are parameters for measuring the performance of a classification problem with two classes:

With this neural network we have been able to classified correctly 94% of the fraudulent payments, we can identify around 19 out of 20 fraudulent payments.

We can also observe this results in the positive rates chart:

The initial positive rate was around 15% and now, after applying our model, it is 92%. This means that with this model, we would be able to recognize 6 times more fraudulent payments.

We can also perform the cumulative gain analysis which is a visual aid that shows the advantage of using a predictive model as opposed to randomness.

It consists of three lines. The baseline represents the results that would be obtained without using a model. The positive cumulative gain shows in the y-axis the percentage of positive instances found against the percentage of the population represented in the x-axis. Similarly, the negative cumulative gain shows the percentage of the negative instances found against the population percentage.

In this case, by using the model, we see that by analyzing 20% of the payments with the higher probability of being fraudulent, we would reach the 100% of the fraudulent payments.

7. Model deployment

The model obtained after all the steps are not the best it could be achieved. Nevertheless, it is still better than guessing randomly.

The next listing shows the mathematical expression of the predictive model.

scaled_avg_amount_days = avg_amount_days*(1+1)/(2000-(4.01153))-4.01153*(1+1)/(2000-4.01153)-1;
scaled_transaction_amount = transaction_amount*(1+1)/(108000-(0))-0*(1+1)/(108000-0)-1;
scaled_is_declined = is_declined*(1+1)/(1-(0))-0*(1+1)/(1-0)-1;
scaled_number_declines_days = number_declines_days*(1+1)/(20-(0))-0*(1+1)/(20-0)-1;
scaled_foreign_transaction = foreign_transaction*(1+1)/(1-(0))-0*(1+1)/(1-0)-1;
scaled_high_risk_countries = high_risk_countries*(1+1)/(1-(0))-0*(1+1)/(1-0)-1;
scaled_daily_chbk_avg_amt = daily_chbk_avg_amt*(1+1)/(998-(0))-0*(1+1)/(998-0)-1;
scaled_6m_avg_chbk_amt = 6m_avg_chbk_amt*(1+1)/(998-(0))-0*(1+1)/(998-0)-1;
scaled_6m_chbk_freq = 6m_chbk_freq*(1+1)/(9-(0))-0*(1+1)/(9-0)-1;

perceptron_layer_0_output_0 = sigma[ -0.730652 + (scaled_avg_amount_days*0.347961)+ (scaled_transaction_amount*-0.866882)+ (scaled_is_declined*0.698547)+ (scaled_number_declines_days*-0.679199)+ (scaled_foreign_transaction*-0.744385)+ (scaled_high_risk_countries*-0.223877)+ (scaled_daily_chbk_avg_amt*-0.948853)+ (scaled_6m_avg_chbk_amt*0.281616)+ (scaled_6m_chbk_freq*-0.272766) ];
perceptron_layer_0_output_1 = sigma[ 0.757568 + (scaled_avg_amount_days*0.0680542)+ (scaled_transaction_amount*-0.254028)+ (scaled_is_declined*-0.58905)+ (scaled_number_declines_days*0.920654)+ (scaled_foreign_transaction*0.0759888)+ (scaled_high_risk_countries*0.961853)+ (scaled_daily_chbk_avg_amt*0.0324707)+ (scaled_6m_avg_chbk_amt*0.283447)+ (scaled_6m_chbk_freq*0.200012) ];
perceptron_layer_0_output_2 = sigma[ -0.406372 + (scaled_avg_amount_days*0.268921)+ (scaled_transaction_amount*0.124512)+ (scaled_is_declined*0.815247)+ (scaled_number_declines_days*-0.362366)+ (scaled_foreign_transaction*0.486023)+ (scaled_high_risk_countries*0.997009)+ (scaled_daily_chbk_avg_amt*0.286682)+ (scaled_6m_avg_chbk_amt*0.97644)+ (scaled_6m_chbk_freq*-0.848083) ];
perceptron_layer_0_output_3 = sigma[ 0.529846 + (scaled_avg_amount_days*-0.871521)+ (scaled_transaction_amount*0.977722)+ (scaled_is_declined*-0.771179)+ (scaled_number_declines_days*0.671753)+ (scaled_foreign_transaction*-0.0239868)+ (scaled_high_risk_countries*-0.501465)+ (scaled_daily_chbk_avg_amt*0.620178)+ (scaled_6m_avg_chbk_amt*-0.797546)+ (scaled_6m_chbk_freq*-0.429626) ];

	probabilistic_layer_combinations_0 = -0.745667 -0.556274*perceptron_layer_0_output_0 -0.661987*perceptron_layer_0_output_1 +0.70813*perceptron_layer_0_output_2 -0.882507*perceptron_layer_0_output_3 
	
is_fradulent = 1.0/(1.0 + exp(-probabilistic_layer_combinations_0);

This formula can be also exported to the software tool required by the company.

8. Video tutorial

You can watch the step by step tutorial video below to help you complete this Machine Learning example for free using the powerful machine learning software, Neural Designer.

References

Related examples:

Related solutions: