Customer segmentation using machine learning

By Roberto Lopez, Artelnics.

Companies' ability to make intelligent use of their data can make a big difference to their competitors.

Customer targeting is the process of analyzing customer features (such as age, education, interests, and spending habits) to select those customers who are more prone to a target product or service.

Companies from many different industries have realized that advanced analytics can select potential clients much better than traditional methods, which allows the design of more effective marketing campaigns.

Contents:

Advanced analytics

The volume, variety, and velocity of information stored in organizations are increasing significantly. The intelligent analysis of these data can be a differentiating factor for companies that adopt this technique.

However, most companies only perform descriptive analyses of their data to know what has happened in the past.

These organizations can get much more value from their data using more sophisticated techniques. This process is known as advanced analytics, and it is illustrated in the next figure.

As we can see, advanced analytics comprises 4 steps:

  1. Descriptive analytics, which answers the question of what happened?
  2. Diagnostic analytics, which answers the question of why did it happen?
  3. Predictive analytics, which answers the question of what will happen?
  4. Prescriptive analytics, which answers the question of how can we make it happen?

As we will see along with this post advanced analytics allows us to select individual targets, which results in increased profitability.

Case study

To illustrate the advanced analytics process, we will apply descriptive, diagnostic, predictive, and prescriptive analytics to a banking institution's real data set.

This case study is solved with the data science and machine learning platform Neural Designer. The following figure is a screenshot of this software.

The goal is to select the customers with a similar profile to those that purchased the product and include them in a marketing campaign.

The data set consists of 1,000,000 clients (or instances), each one with 500 features (or variables).

The following table summarizes the data set characteristics.

Number of customers: 1,000,000
Number of features: 500
Total data: 500,000,000

Some types of variables in our data set are the following:

The data set details have been anonymized to protect the company's privacy.

The target variable is the purchase of a given product or service. This is a binary variable that is 1 if the customer has purchased that product and 0 if he hasn't.

Descriptive analytics

Descriptive analytics is the first stage of advanced analytics. It answers the question What happened?

It is essential to know the acceptance ratio of our product or service among our clients.

The next pie chart shows the distribution of the purchase variable.

As we can see, the number of customers that have purchased the product is much lower than the number of customers that haven't purchased it. In this case, it represents less than 1%.

Diagnostic analytics

Diagnostic analytics is the second phase of advanced analytics. It answers the question Why did it happen?

This phase gives us insights into which factors impact the purchase of our product or service.

To discover that, we can calculate the inputs-targets correlations.

As we can see, all the input variables have a very low correlation with the target variable (below 20%).

This indicates that, individually, no feature significantly influences the purchase. Instead, the problem is complex, and various factors influence the purchasing process.

The features with the highest correlation are related to the engagement of the customer with the company.

Predictive analytics

Predictive analytics is the third stage of advanced analytics. It answers the question of "What will happen?

Neural networks are one of the most powerful techniques for building predictive models. These are mathematical algorithms that discover patterns from data by emulating the functioning of the human brain.

For our case study, the neural network takes as inputs the customer, company, product, and engagement factors and produces as output the probability that the customer purchases the product. The following graph illustrates that neural network.

As we can see, neural networks can analyze any number and type of factors.

Prescriptive analytics

Prescriptive analytics is the fourth and last step of advanced analytics. It answers the question How can we make it happen?

After building the predictive model, we exploit it to maximize our benefits. To do that, we design a marketing campaign targeted to those customers with the highest probability of purchasing the product.

We will assume that the unit cost per contact with each potential customer is 10 USD and that the unit benefits if they buy the product is 1,000 USD.

The profit chart simulates the benefit without applying the predictive model (grey line) and applying it (purple line).

Without using the predictive model, the more clients we contact, the more losses we have. Indeed, the conversion rate is quite small (0.6%).

When using the predictive model, we only contact those customers more likely to purchase. As we can see, the maximum benefit is produced by contacting the 20% of customers with the most probability of purchase.

Therefore, if we do not use the predictive model, the losses are around -15,000 USD. But, by using it, the benefits are approximately 40,000 USD.

Conclusions

Advanced Analytics studies all types of data to target those most profitable clients accurately. This type of personalized marketing can multiply the profits of companies.

Neural Designer is a data science and machine learning platform that allows you to apply advanced analytics easily. You can download a free trial here.

Related posts: