By Pablo Martin, Artelnics.
In this article, we see how to perform a market basket analysis using R and Neural Designer.
R is a free programming language for statistical computing and graphics widely used in the data science community to perform data analysis.
Neural Designer is a software tool for data analytics based on neural networks, one of the main areas of artificial intelligence research.
The next picture shows the structure of our example.
Our objective is to analyze a dataset from a grocery store to create a recommendation system. This system will be capable of generating accurate recommendations about products that the user may be interested in.
The type of learning that we use for this example is called "unsupervised learning". It is a machine learning technique used to find patterns in data when there are no target values.
In our example, the output values will be the percentages that the products have to be in the same shopping basket.
The dataset selected for our example consists of 9835 transactions of a grocery store. Each transaction can be a single product or several products.
From a first look at the database, we cannot know the number of items in the store. Also, the format of the database is difficult to analyze. That is why we need to make a preliminary treatment of the collected information before analyzing it.
As we have said before, it is necessary to pre-process the information that we have in our database, and for that purpose, we use R.
Using R, we convert the database into a binary matrix, which we need to perform the modeling process with Neural Designer. The following script is the one we used to make this change.
# Package necessary for transaction analysis install.packages("arules") require(arules) # Load data Shopping_Cart <- read.transactions("groceries.csv", sep=",") # Database checking summary(Shopping_Cart) itemFrequencyPlot(Shopping_Cart, topN=20) # Convert data to a numeric matrix as(Shopping_Cart, "matrix") as(Shopping_Cart, "matrix")*1 # Save results to file write.csv((as(Shopping_Cart, "matrix")*1), file = "Shopping_Cart.csv", row.names=FALSE)
The first step of our script is to load the package needed for R to read transactions, the "arules" package. Once we have loaded "arules", we execute it and perform the first operation: loading the data.
The next step is to check the data to see if they are correctly loaded, for which we use the "summary" command. We also paint a bar graph with the 20 products that have been bought the most. The following image shows the graph with these top 20 products.
As we can see in the chart above, whole milk is the product that is bought the most.
Once the model has been tested, we have to export it to CSV to analyze it with Neural Designer.
For that purpose, we convert the data into a binary matrix and use the command "write", we export it to CSV.
The result is a data set containing a variable for each product. The value is 1 if the product has been purchased or 0 if not. This data set is ready to start its analysis with Neural Designer.
When data has already been pre-processed, it is time to add it to Neural Designer to make our recommendation system. Neural Designer provides an easy way of analyzing and deploying advanced analytics models. The next picture shows the data set tab in Neural Designer.
Now that the data is loaded into the software, we can check the basic statistics of each variable. The table below shows the minimums, maximums, means, and standard deviations of the top 20 variables in this data set.
We use neural networks to develop our recommendation system, the machine learning technique that Neural Designer implements. The neural network defines the predictive model as a multidimensional function containing adjustable parameters. The first step to creating our recommendation system is choosing a neural network architecture representing the classification function.
Because the neural network of this problem is very complex (169 inputs, 25 hidden neurons, and 169 outputs), the following image represents what would be a neural network if the study was with the 20 most influential variables.
The next step to carry out is to train the neural network mentioned above. For this purpose, we apply the Quasi-Newton method to obtain a good model that will recommend the shopping basket more suitable for the customer. To know the types of algorithms that can be used to train a neural network, you can read the article 5 algorithms to train a neural network.
Once our model is trained and ready to use, we export it from Neural Designer to R. We used the task "Export to R" of Neural Designer to obtain our model as a formula of R.
The following image shows the "Export to R" task in the Neural Designer task manager.
Now that we have obtained our recommendation system as an R script, we run R studio to check it. The next picture shows our script in R studio. To download the R script to try it yourself, click here.
Finally, we analyze an example of a shopping cart to check what recommendations our system would make.
For instance, we analyze a shopping basket that is made up of citrus fruit, frozen meat, newspaper, other vegetables, and whole milk.
Applying our R model for that customer, it recommends the following products: Instant food products, yogurt, buttermilk, frozen fish, red/blush wine, pip fruit, and butter.
It is vital to have solutions for each step of the data analysis. R is a perfect solution for the treatment and deployment of data. In turn, Neural Designer allows us to develop complex predictive models with just a few clicks.
In this article, we have demonstrated the potential of combining R and Neural Designer to generate applications such as recommendation systems.
Applying a recommendation system will allow us to increase cross-selling and therefore increase our profits.