Market Basket Analysis using R and Neural Designer

By Pablo Martin, Artelnics.

In this article we are going to see how to perform a market basket analysis using R and Neural Designer. R is a free programming language for statistical computing and graphics widely used among the data science comunity for performing data analysis. Neural Designer is a software tool for data analytics based on neural networks, a main area of artificial intelligence research. The next picture shows the structure of our example.

Our objective is to analyze a dataset from a grocery store in order to create a recommendation system. This system will be capable of generating accurate recommendations about products that the user may have an interest in.

The type of learning that we are going to use for this example is called "unsupervised learning". It is a type of machine learning technique used to find patterns in data when there are no target values.

In our example the output values will be the percentages that the products have to be in the same shopping basket.

The dataset selected for our example consists of 9835 transactions of a grocery store. Each transaction can be a single product or several products. The next image illustrates this data set.

From a first look at the database, we cannot know the number of items in the store. Also, the format of the database is difficult to analyze, that is why we need to make a preliminary treatment of the information we have collected before analyzing it.

As we have said before, it is necessary to pre-process the information that we have in our database and for that purpose we use R.

Using R we convert the database into a binary matrix, which is the format we need to be able to perform the data advanced analysis with Neural Designer. The following script is the one we used to make this change.

# Package necessary for transaction analysis install.packages("arules") require(arules) # Load data Shopping_Cart <- read.transactions("groceries.csv", sep=",") # Database checking summary(Shopping_Cart) itemFrequencyPlot(Shopping_Cart, topN=20) # Convert data to a numeric matrix as(Shopping_Cart, "matrix") as(Shopping_Cart, "matrix")*1 # Save results to file write.csv((as(Shopping_Cart, "matrix")*1), file = "Shopping_Cart.csv", row.names=FALSE)

The first step of our script is to load the package needed for R to read transactions, the "arules" package. Once we have loaded "arules", we execute it and perform the first operation: loading the data.

The next step is to check the data to see if they are correctly loaded, for which we use the "summary" command. We also paint a bar graph with the 20 products that have been bought the most. The following image shows the graph with these top 20 products.

As we can see in the chart above, the product that is bought the most is whole milk.

Once the model has been tested, we have to export it to CSV so that we can analyze it with Neural Designer.

For that purpose we are going to convert the data into a binary matrix and use the command "write" we export it to CSV. The next image shows this CSV.

The result is a data set that contains a variable for each product and prints 1 if the product have been purchased or 0 if not. This data set is ready to start its analysis with Neural Designer.

When data has already been pre-processed, it is time to add it to Neural Designer in order to make our recommendation system. Neural Designer provides an easy way for analyzing and deploying advanced analytics models. The next picture shows the data set tab in Neural Designer.

Now the data is loaded into the software, we can check basic statistics of each of the variables. The table below shows the minimums, maximums, means and standard deviations of the top 20 variables in this data set.

In order to develop our recommendation system, we use neural networks, the machine learning technique that Neural Designer implements. The neural network defines the predictive model as a multidimensional function containing adjustable parameters. The first step to create our recommendation system is to choose a neural network architecture that represents the classification function.

Because the neuronal network of this problem is very complex, 169 inputs 25 neurons in the hidden layer and 169 outputs, the following image represents what would be a neural network if the study was with the 20 most influence variables.

The next step to carry out is to train the neural network mentioned above. For this purpose we apply Quasi-Newton method to obtain a good model that will recommend us the shopping basket more suitable for customer. To know the types of algorithms that can be used to train a neural network you can read 5 algorithms to train a neural network.

Once our model is trained and ready to use, we have to export it from Neural Designer to R. We used the task "Export to R" of Neural Designer to obtain our model as a formula of R.

The following image shows the "Export to R" task in the Neural Designer task manager.

Now that we have obtained our recommendation system as an R script, we run R studio to check it. The next picture shows our script into R studio. If you want to download the R script to try it yourself click here.

Finally we analyze an example of a shopping cart to check what recommendations our system would make.

For instance, we analyze the shopping basket of a customer that is made up of citrus fruit, frozen meat, newspaper, other vegetables and whole milk.

Applying our R model for that customer, it recommends the following products: Instant food products, yogurt, butter milk, frozen fish, red/blush wine, pip fruit, and butter.

It is important to have solutions for each step of the data analysis. R is a perfect solution for the treatment and deployment of data. In turn Neural Designer allows us to develop complex predictive models with just a few clicks.

In this article we have demonstrated the potential of combining R and Neural Designer to generate applications such as recommendation systems.

Applying a recommendation system will allow us to increase cross-selling and therefore increase our profits.

- Principal components analysis.
- Electricity demand forecasting.
- Customer segmentation using advanced analytics