Related posts:
> Genetic algorithms for feature selection.> 3 methods to deal with outliers.
> Customer segmentation using advanced analytics.
Nowadays the amount of data that’s being created and stored in organizations is increasing significantly. Due to this large amount of information, new business opportunities have been raised.
Most organizations are stuck at lower-value descriptive analytics. But more sophisticated analysis can bring great business value.
But, what is Advanced Analytics?
Advanced Analytics is the set of techniques used to discover intricate relationships, recognize complex patterns or predict current trends in your data.
Its objective is to model data from internal and external variables in order to obtain useful insights that results in smarter decisions and better business results.
Advanced Analytics contains descriptive, diagnostic, predictive and prescriptive analytics.
These methods allow us to know what has happened in our company, why it happened, what will happen and what we can do to change what will happen in a way that benefits us.
Descriptive analytics is the first stage of data analysis that creates a summary of historical data to yield useful information and possibly prepare the data for further analysis.
Descriptive statistics are a very important part of data analysis, they are useful to show historical insights regarding the company’s financials, production, operations, sales, and customers.
This phase consists of tables and graphs so that the user can easily interpret the information. Some of the processes that are carried out at this stage are described below.
Basic statistics are a very valuable source of information when designing a model, since they might alert to the presence of spurious data. It is a must to check for the correctness of the most important statistical measures of every single variable.
The table above shows the minimums, maximums, means and standard deviations of all the features in the data set.
Histograms show how the data is distributed over its entire range. In approximation problems, a uniform distribution for all the variables is, in general, desirable. If the data is very irregularly distributed, then the model will probably be of bad quality.
As we can see, the histogram looks like a Gaussian bell slightly inclined to the right side.
Box plots display information about the minimum, maximum, first quartile, second quartile or median and third quartile of every variable in the data set. They consist of two parts: a box and two whiskers.
The chart above shows the box plot for the variable Feature 2. The minimum of the variable is 25.36, the first quartile is 41.74, the second quartile or median is 52.08, the third quartile is 66.54 and the maximum is 81.56.
Diagnostic Analytics is the next level of analysis. It is a form of Advanced Analytics that is focused on determining the factors and events that contributed to the outcome.
This phase consists of techniques such as calculating correlations and interpreting interactive visualizations.
In classification applications, it might be interesting to look for logistic dependencies between single input and single target variables. The logistic correlation is a numerical value between 0 and 1 that expresses the strength of the logistic relationship between a single input and output variables.
The maximum correlation (0.948) is the yield between the feature 1 and the target. Such a high correlation would indicate that we have to study this variable more thoroughly.
This technique plots graphs of inputs versus targets. These charts might help to see the dependencies of the targets with the inputs.
The chart above shows that as the value of Feature 1 increases, the value of Target decreases.
Predictive analytics is the branch of Advanced Analytics that is used to make predictions about unknown future events.
This is the most important phase of the analysis, its output is a predictive model capable of knowing what is going to happen in the future.
It encompasses a variety of machine learning techniques such as k-nearest neighbors, decision trees, random forest, neural networks, etc, to identify the likelihood of future outcomes based on historical data. Some of them are explained below.
K-nearest neighbors is a simple method used for classification and regression. It stores all available cases and classifies new cases based on a similarity measure.
A decision tree is a mathematical model helping you to choose between several courses of action. It uses estimates probabilities to calculate likely outcomes.
Random forests are a combination of tree predictors for classification, regression and other tasks. Each tree depends on the values of a random vector sampled independently and with the same distribution for all trees in the forest.
It is one of the most popular methods or frameworks used by data scientists.
Artificial Neural Networks (ANN) are computational models based on the neural structure of the brain. They are recognized as one of the best machine learning methods.
The outputs from the neural network depend on the inputs fed to it and the different parameters within the neural network.
Prescriptive analysis is the last step of an advanced data analysis. It consists of the application of the predictive model to determine the best solution or outcome among various choices, given the known parameters.
In this phase, not only is predicted what will happen in the future using our predictive model, but also is shown to the decision maker the implications of each option.
For instance, we propose 2 different scenarios to see the how Target varies as a function of a single input, in this case depending on Feature 3.
The next table shows the reference variables for the first scenario.
The next plot shows the output Target as a function of the input Feature 3. The x and y axes are defined by the range of the variables Feature 3 and Target, respectively.
The chart above shows that when Feature 3 is 1010 Target reaches its maximum 484.93.
The next table shows the reference variables for the second scenario.
We use the predictive model to calculate the different values that the Target variable takes as a function of the variable Feature 3.
As we can see in the previous graph, when Feature 3 grows, Target also does it.
As we have seen in this post, simple analytics are not enough to get actionable insights and improve business operations.
Thanks to Advanced Analytics you will not leave anything to chance. It includes a global vision from the past to the possible futures that can be given in your company.