Forecasting of power demand plays an essential role in the electric industry, as it provides the basis for making decisions in power system planning and operation.
A great variety of methods for predicting electricity demand are being used by electrical companies, which are applicable to short-term, medium-term or long-term forecasting.
But changing the electrical demand arises from complex interactions between personal, corporative and socio-economic factors. In such a dynamic environment ordinary forecasting techniques are not sufficient, and more sophisticated methods are needed.
The objective is to effectively untangle all the factors that lead to demand change and to determine the underlying causes. But analyzing multiple personal and social factors is complicated, to say the least. We need both rich data, along with complex predictive models to analyze it.
Advanced data mining technology for modeling and understanding very complex systems will allow companies to represent the complexity of electricity demand in a way previously unachievable. More specifically, the objective is to achieve more accurate forecasts for the electricity demand.
A perfect electricity demand forecasting requires analyzing the following types of variables:
These data are brought together in a single predictive model, so as to discover associations between all the above variables. This will give us deeper understanding than ever before on the causes of demand, and will allow better decision making.
Data mining could anticipate customer response and changing attitudes about energy consumption by looking at all types of factors. The results are improved forecast accuracy, which means better information to decide what the best course of action is.
To carry out this article, we are going to perform an example about the electric demand forecasting.
The dataset used here is taken from Kaggle, one of the most important repositories of data science. It contains information about 2.000 days records with 48 attributes (hourly temperatures and demands). This adds up to more than 100.000 data. The next image illustrates this dataset.
In the previous table, the first column represents the date when the measure of each instance was done. The next 24 columns represent the ambient temperatures for each hour of the day. Finally, the last 24 columns represent the electric power demand for each hour of the day.
In order to apply neural networks, it is necessary to pre-process the data. The data transform into a group of instances whose inputs represent the data from the past and whose targets represent the information from the future. The temperature associated with future demand will also be used as input variable because we can know it in advance.
In this case, we have set the project configuration in order to predict the future day from the data of the previous two days.
For that purpose, the number of input variables that train the neural network will be 120 and the number of target variables will be 24, as is shown in the next table.
The previous table shows that we want to predict the hourly demand for tomorrow, so these 24 variables will be the target variables in our study.
It might be interesting to study how the electric power demand depended with the different features. For that purpose, we calculate the scatter charts.
The first scatter chart represents the dependence of the demand with the temperature. We are not going to represent all the demand variables against all the temperature variables because they have the same appearance. In this case, it is represented the demand for tomorrow against the temperature of today at the same hour.
As we can see in the previous picture, there is a minimum demand for tomorrow when the temperature is the most pleasant (55-65°F). Then, the demand grows up as the temperature moves away from that value.
Then, it is represented the scatter chart of the demand and the past demands. In this case, we are going to plot the demand of tomorrow at 12:00 against the demand of today at the same hour.
There is also a minimum value for the demand of tomorrow when the demand of today gets its minimum value.
Finally, the time series plots are represented here. The next one corresponds with the temperature at 1:00 time series in the data set.
This figure shows that the data set starts (t=0) in winter when the temperature gets its lowest value. Then in t≈200 it gets the summer time and maximum temperatures.
Then, the time plot of the demand at 7:00, 11:00, 17:00 and 24:00 are represented.
In this picture, we can see that at 7:00 the demand is much bigger in winter. On the other hand, at 11:00 the demand in summer is growing up until five in the afternoon, where demand in summer exceeds demand in winter. Subsequently, at 24:00, the demand is again higher in winter than in summer.
This is because at 7:00 in winter it is very cold so people usually spend more electricity in the heating system. However, at 7:00 in summer, the temperature is pleasant and there is no electricity expenditure. As it gets closer to midday, summer temperatures are higher and more electricity is needed for air conditioner. In summer, higher temperatures are around 17:00, so it is the point of the day in which the summer demand gets its maximum value. Finally, at 24:00 temperatures in summer are more pleasant and winter demand grows up again.
The way to obtain an optimal topology of the neural network is the order selection method. Order selection algorithms are used to get the optimal number of neurons in the neural network. In this case, the order selection algorithm that has been selected is called incremental order.
The next chart shows the loss history for the different subsets during the incremental order selection process. The blue line represents the training loss and the red line symbolizes the selection loss.
As we can see in the previous picture, the optimal order is 7. The next table shows the order selection results by the incremental order algorithm.
By looking at the loss values, we can notice that our predictive model has a good accuracy because of the low loss values. Finally, the architecture of this neural network can be written as 120:7:24.
Once the predictive model has been trained, it is needed to evaluate its predictive power on new data that have not been seen before, the testing instances subset. This process will determine if the predictive model is good enough to be moved into the deployment phase.
First, we perform the linear regression analysis between the scaled neural network outputs and the corresponding targets for an independent testing subset. The next figure represents the linear regression of the predicted demand at 1:00.
To determine if it was a good analysis we have to calculate the different parameters from the linear regression.
The next table lists the linear regression parameters average for the scaled outputs (demand 1-24h).
The intercept, slope and correlation are very similar to 0, 1 and 1, respectively, so the neural network is predicting well the testing data.
Finally, we calculate the error data statistics. As we have 24 target variables, we are going to calculate the average of all these error statistics. All in all, the mean percentage error is 5,113%. This is a low value when we are studying demand forecasting so we already can deploy the model.
Once we have our predictive model tested, we are able to apply it in order to predict the electric power demand for tomorrow.
The next figure shows the hourly electric power demand prediction for tomorrow in kW, considering the following inputs: yesterday, today and tomorrow's temperatures and yesterday and today power demands.
As we can see, the graph looks the expected. Between 5:00 and 8:00, the electricity demand will increase its value in 4.6 MW. This phenomenon match with the beginning of the residential, industry and services activities.
Then, between 16:00 and 20:00 its value will increase 3.2 MW. The reason is that most people come back to home at these hours and this implies that people turn on the heating system, etc.
Due to predictive analytics, we have been able to predict tomorrow's electric demand with highly successful results. For that reason, this will be a great advance for companies in the energy sector.
In conclusion, the use of our predictive model means better information to decide what the best course of action is. Therefore, companies which use this technology will have a competitive advantage over their main competitors.