By Pablo Martin, Artelnics.
Nowadays, the amount of data that is being created and stored in organizations is increasing significantly.
Due to this large amount of information, new business opportunities are continuously rising.
However, most organizations are stuck in lower-value descriptive analytics. But more sophisticated analysis can bring great business value.
Advanced analytics is the set of techniques used to discover relationships, recognize patterns, predict trends, and find associations in your data.
The objective is to use internal and external sources of information to obtain useful insights that result in smarter decisions and better business outcomes.
These methods allow us to know what happened in our company, why did it happen, what will happen and how we can make something happen. The next figure illustrates the whole advanced analytics process.
As we can see, the advanced analytics process comprises 4 phases: descriptive, diagnostic, predictive, and prescriptive analytics. Each of these phases is more complex than the previous one but provides more value.
Descriptive analytics is the first stage of advanced analytics. It answers the question: What happened?.
In this phase, insights regarding the general aspects of the company are obtained. The output is a set of tables and charts with historical information about operations, sales, customers, etc.
Some of the analysis carried out at this stage are:
Statistics provide very valuable information since they put the data set in context.
The most important statistical parameters are the minimum, the maximum, the mean, and the standard deviation.
The table below illustrates these parameters for the total amount of money spent by a customer in an online store.
|Total amount (USD)||12||500||51||56|
As we can see, each customer has spent an average of 51 USD, but some people have spent up to 500 USD.
Distributions show how the data is arranged over its entire range.
Histograms are used to see how continuous variables are distributed. A normal (gaussian) or uniform distribution is, in general, desirable.
For example, the following figure depicts a histogram for the age of our customers.
As we can see, most of our customers are between 30 and 40 years old.
Pie charts are used to see the distribution of binary or nominal variables. Those types of variables should be uniformly distributed.
The following figure shows the pie chart for the customers that purchased a product from an online store during the last marketing campaign. This is a binary variable.
As we can see, only 1% of the people that see our add buy the product.
Diagnostic analytics is the second phase of advanced analytics. It answers the question Why did it happen?.
It is focused on determining the factors that contributed to the outcome.
Here we concentrate on the following techniques:
Scatter charts might help to discover dependencies between the output variables and the input variables.
This charts plot of output values versus input values.
The following scatter
The chart above shows that as the value of Feature 1 increases, the value of Target decreases.
Correlations are also a useful technique to discover dependencies between input and output variables.
A correlation is a numerical value between 0 and 1 that expresses the strength of the relationship between two variables.
The maximum correlation (-0.287) is a yield between the recency and the conversion. Such a high correlation indicates that we have to study this variable more thoroughly.
Predictive analytics is the third stage of advanced analytics. It answers the question What will happen?
Predictive analytics is the branch of Advanced Analytics that is used to make predictions about unknown future events.
This is the most important phase of the analysis; its output is a predictive model capable of knowing what is going to happen in the future.
It encompasses a variety of machine learning techniques such as k-nearest neighbors, decision trees, neural networks, etc., to identify the likelihood of future outcomes based on historical data. Some of them are explained below.
K-nearest neighbors is a very simple method used for classification and approximation.
It stores all available cases and classifies new cases based on a similarity measure.
In the graph above, we can see many data points classified in two categories: category A (blue circles) and category B (orange squares).
A new data point is introduced (green triangle), and the k-nearest neighbors method decides, based on the similarity between them, if the point belongs to the category A or to the category B.
Decision trees are also a simple method used for classification and approximation.
A decision tree is a mathematical model helping you to choose between several courses of action. It estimates probabilities to calculate likely outcomes.
In the diagram above, we can see an example of a decision tree. Feature 1, feature 2 and feature 3 are three numerical features of the dataset; A, B and C, are the three categories in which the instances are classified; and a, b, c and d are numerical values.
In order to classifiy an instance, we start on the top and follow the tree's branches with the values of our instance. First, we check the intance's value for feature 1, if it is greater than a, it would be classified in category B and we would be done. If it is equal or less than a, then we have to check the value for feature 2 and check if it is greater or less than the value b. We follow this process until we reach a leaf (a category: A, B or C).
Neural Networks are recognized as one of the most powerful machine learning methods.
They are used in classification and approximation tasks.
Neural networks are computational models based on the neural structure of the brain.
The outputs from the neural network depend on the inputs fed to it and the different parameters within the neural network.
The graph above shows an example of a neural network with 4 inputs (feature 1, 2, 3 and 4). When we introduce the values of the four features in the neural network, we get an output.
Prescriptive analytics is the fourth and last step of advanced analytics. It answers the question How can we make it happen?.
It consists of the application of the predictive model to determine the best solution or outcome among various choices.
In this phase, not only is predicted what will happen in the future using our predictive model but also is shown to the decision-maker the implications of each option.
For instance, we propose 2 different scenarios to see how the target varies as a function of a single input, in this case depending on Feature 3.
The next table shows the reference variables for the first scenario.
The next plot shows the output Target as a function of the input Feature 3. The x and y axes are defined by the range of the variables Feature 3 and Target, respectively.
The chart above shows that when Feature 3 is 1010, Target reaches its maximum 484.93.
As we have seen in this post, simple analytics are not enough to get actionable insights and improve business operations.
Thanks to Advanced Analytics, you will not leave anything to chance. It includes a global vision from the past to the possible futures that can be given in your company.
The data science and machine learning platform Neural Designer contains many utilities to perform descriptive, diagnostic, predictive, and prescriptive analytics easily.
You can download Neural Designer now and try it for free.