Sales forecasting is a crucial task for managing a store, and machine learning can help identify the factors that influence sales in a retail store and estimate future sales.

In this post, we use historical sales data of a drugstore chain to predict its sales up to one week in advance.

You can follow the step-by-step construction of this predictive model by downloading the free trial of the data science and machine learning platform Neural Designer.

Contents

  1. Data analysis.
  2. Model training.
  3. Testing analysis.
  4. Model deployment.
  5. Conclusions.

Data analysis

The first step in the analysis is to examine the dataset, which contains sales information from the drug stores.

Monthly sales

The following time series chart illustrates the monthly sales volume.

Sales per month in a retail store

As we can see, the number of sales peaks in July and more so in December.

The months with the fewest sales are January, May, and October.

Daily sales

The following chart shows the distribution of sales throughout the month.

Sales by day of month in a retail store

In this case, the days at the beginning and end of the month have higher activity.

Around the middle of the month, a smaller peak occurs.

Weekly sales

It is also essential to examine the sales numbers by weekday.

The following time series chart shows the sales in these shops from Monday (1) to Sunday (7).

Sales by weekday in a retail store

Monday is the preferred day for customers to make purchases in these retail shops.

During the rest of the week, the sales decrease from Monday to Thursday and increase on Friday.

On Sunday, there is a sharp decline in sales.

This is because most of the shops in this drugstore chain are closed on this day.

Variables

The next step is to select and prepare the variables we will use.

The following list shows the input variables, also known as predictands, and the target variables, also referred to as predictors.

Calendar variables

  • Day of the month.
  • Month.
  • Day of the week.

Event variables

  • Promotion.
  • State holidays.
  • School holidays.

Historical data variables

  • Sales of the 7 previous days.
  • Promotions of the 7 previous days.

Target variables

  • Sales of the next seven days.

In total, there are 20 inputs and 7 targets.

Model training

After defining the variables we will use for the analysis, we will use Neural Designer to build the predictive model for the sales of the stores.

The following image shows a representation of the neural network used for predictive analysis.

Neural network to forecast sales in retail stores

The information on the previous week’s date, promos, holidays, and sales enters the neural network through the left layer.

Then, it is analyzed by perceptrons in the middle layer to find the patterns that determine the number of sales given by the last layer.

Now, the neural network is ready to be trained using the Quasi-Newton algorithm.

For more information on this and other optimization algorithms, refer to “5 Algorithms to Train a Neural Network.”

Testing analysis

Before using the model to forecast sales, the final step is to determine its predictive power on an independent dataset that has not been used for training.

The following chart shows the linear regression analysis between the scaled output of the neural network and the corresponding scaled targets.

Regression to forecast sales in retail stores

The previous linear regression analysis gives us a correlation coefficient of 0.976 and a slope close to 1.

This indicates that the model is performing well for this dataset.

Consequently, the model is ready to be moved to the deployment phase.

Model deployment

Once we have tested the model, we can predict the shop’s sales one week in advance.

We introduce the data from the current week as input and obtain the expected sales for each day of the following week.

In the following chart, there is a comparison between the predicted sales applying a promotion or not.

The blue line represents the number of sales the model predicts without a promotion, and the orange line represents the number of sales with a promotion.

We can see a significant increase.

Retail store sales predictions

As we can see, following Monday is when most sales are expected.

During the rest of the week, the number of sales is expected to remain stable, but it is anticipated to decrease on Sunday, as most stores will be closed.

Conclusions

In this post, we have developed a predictive model that can help retailers forecast their future sales.

Using this model, retailers can plan the number of products they need.

As a consequence, the system will allow them to increase their profits.

References

  • The data used for this example can be downloaded from Kaggle. It has been processed with Python to obtain the desired results. You can download the processed dataset here.

Related posts