Retail store sales forecasting

By Pablo Martin, Artelnics.

Retail store sales forecasting image

Sales forecasting is an essential task for the management of a store. Being able to estimate the quantity of products that a retail store is going to sell in the future will allow the owners of these shops to prepare the inventory that they will need.

Predictive analytics can help us to study and discover the factors that determine the number of sales that a retail store will have in the future.

During this article we are going to use the information about the sales of a drug store from the last two years in order to predict the amount of sales that it is going to have one week in advance.

Data analysis

The first step of the analysis is to study the available information from the drug store. The next chart shows the number of sales by month.

Sales per month in a retail store
Sales by month.

As we can see, most of the sales are made between March and July. Then, the number of sales decreases till December, when it grows again.

Secondly, we are going to study how the sales are distributed along the month.

Sales by day of month in a retail store
Sales by day of month.

In the case of this shop, the days of the beginning of the month are the ones with the major activity. After the middle of the month the sales remain stable.

Lastly, it is also important to take a look at the number of sales by weekday. The next chart shows the sales in this shop from Sunday (1) to Saturday (7).

Sales by weekday in a retail store
Sales by weekday.

Sunday is the day preferred by the customers to buy in this retail shop. During the rest of the week, the sales decrease from Monday to Wednesday and increase from Wedenesday to Friday. Saturday is the day with the least number of sales.

The next step of the preparation of the data for the analysis is to select and prepare the variables that we are going to use. The original data set classifies each store within four groups and contains the distance to the nearest competitor. We are not going to use these variables since we are analyzing just one retail store.

Also, the data set contained the number of customers that bought something in the store per day. However, it can not be used for the analysis since we cannot know this value in advance. The next table shows the variables that we are going to use for the analysis.

Variables used for forecasting sales in a retail store

The number of inputs will be 14 and there is only one target, the number of sales for a given day. Within the inputs we can find information about the date such as the weekday, the month and the day of the month. In addition, for every day, it is recorded whether the shop had a promo and if it had it the day before, the state holidays (Christmas, public holiday,...), the school holidays and the number of sales of the week before the previous week.

Once the variables have been defined, we can calculate the dependencies between all the inputs and the target. The next chart shows the linear correlations between each input and the target variables "Sales".

Correlations between variables and sales in a retail store.
Sales correlations.

The number of sales of the same weekday of the previous week, the weekday and the state holidays are highly correlated with the number of sales.

Training analysis

After defining the variables that we are going to use for the analysis, it is time to use Neural Designer in order to build the predictive model that will allow us to predict the sales of the store. The next image shows a representation of the neural network that we are going to use for the analysis.

Neural network to forecast sales in retail stores

The information of the date, promos, holidays and sales of the previous week enters to the neural network through the left layer. Then, it is analyzed by the perceptrons in the layer from the middle in order to find the patterns that determine the number of sales, which is given by the last layer.

Now, the neural network is ready to be trained using the Quasi-Newton algorithm. In order to find more information about this and other training algorithms, you can read 5 algorithms to train a neural network.

Testing analysis

The last step before using the model to forecast the sales is to determine its predictive power on an independent set of data that have not been used before for the training. The next chart shows the linear regression analysis between the scaled output of the neural network and the corresponding scaled targets.

Regression to forecast sales in retail stores
Linear regression.

The next table shows the parameters of the previous linear regression analysis.

Regression parameters to forecast sales in retail stores

The intercept and the slope are close to 0 and 1 respectively and there is a correlation between the ouptuts and the targets of almost 91%. This means the the model is predicting well this set of data. As a consequence, the model is ready to be moved to the deployment phase.

Model deployment

Once the model has been tested, it can be used to predict the sales of the shop one week in advance. As an example, we are going to predict the number of sales of this retail shop for the last week of July, without state nor school holidays and knowing that the sales of the previous week from Sunday to Saturday have been: 31665, 19169, 17836, 17663, 17513, 18985 and 19042, respectively.

Retail store sales predictions
Sales prediction.

As we can see, the Sunday of the next week is the day when most of the sales are going to be made. During the rest of the week, the number of sales will remain stable and they will slightly decrease with respect to the previous week.

Conclusions

During this article we have developed a predictive model that can help retailers to determine the number of sales that they are going to make in the future.

By using this model, retailers will be able to planify the amount of products that they are going to need and, as a consequence, the system will allow them to increase their profits.

Bibliography

  • The data used for this example can be downloaded from Kaggle.