By Roberto Lopez, Artelnics.
The goal of this study is to forecast the weather by making use of neural networks.
An Objective Forecasting System is a set of rules, diagrams, equations, etc. which allows, originating from a dataset, to obtain a unique value as the prediction of a meteorological variable. Briefly, an Objective Forecasting System is such that, originating from a data set, produces a unique forecasting.
One of the Objective Forecasting Systems is the numerical weather forecasting, in which the meteorological variables are obtained from the physics equations, by using certain numerical methods. But Numerical Models have trouble with determining short scale characteristics and variables at surface.
In this work we attempt to one-day forecast surface variables at a weather station by means of an artificial neural network. The variables we are going to study are rainfall, temperature, pressure and dew point. For that purpose we will use data measured by the weather station.
Once the meteorological variables we wish to predict and the forecasting term are chosen, it is necessary to know the available data.
The available data were obtained by the Spanish Meteorological Office at the La Coruña Weather Station. It consists of a daily register of meteorological variables for a nine-year period. These samples have been taken from surface measurements (rainfall, pressure, temperature, dew point temperature) and from soundings at different atmospheric pressure levels (height, temperature, dew point temperature). The following table summarize the variable information.
In forecasting problems , the data from the data file must be converted into a proper format according to the number of lags and the number of steps ahead. As we said before, we want to predict one day in the future so the number of steps ahead will be one. In order to achieve this, we will use the available information of the three previous days, so the number of lags will be 3.
Once the data file has been loaded, each instance will have 100 variables and will contain the information about four days: three of them will be used as information from the past to forecast the information about the other one. As we said before, we only want to forecast the rainfall, the surface pressure, surface temperature and surface dew point temperature so they will be set as targets and the rest of variable of that day (ahead 1) will be set as unused. Furthermore, the variables date and day of the year do not provide any useful information for the analysis and, therefore, we will set them as unused. By performing the task "Report data set", we can obtain a bars chart in which the uses of the variables is shown.
Then, the number of inputs used to forecast the 4 target variables (rainfall, pressure, temperature, dew point temperature) is 69. The number unused variables is 27.
There are other tasks that can provide some useful information about the data set. The "Plot forecasting" task displays a chart which shows observations (y-axis) against time (x-axis) so we can obtain knowledge about the behaviour of the original variables along time. The next charts plot the rainfall, pressure, temperature and dew point temperature variables, respectively.
As we can see in the first chart, the rainfall variable presents a high temporal variability and, therefore, it will be more problematic to be modelled than the other ones.
The data set is now prepared to be analyzed. We will use a neural network with one hidden layer and two nuerons in it, 69 inputs and 4 outputs. The training algorithm will be the quasi-Newton method, the error method will be the normalized squared error and we will use the neural parameters norm as regularization method. The task "Perform training" shows the results of the analysis which are shown in the next figure. Furthermore, as we are studying physical variables, the scaling method that best fits them is the mean standard deviation due to the fact that they have a normal distribution.
The analysis time for the 3284 instances was only 2 seconds. The initial values of the performance and the selection performance were 12.4172 and 11.4756 respectively. Their final values after 202 iterations are 0.309702 for the performance and 0.342381 for the selection performance. This task also shows the charts of the performance history and the selection performance history as shown below.
The objective of forecasting validation is to determine its quality. To validate a forecasting technique we need to compare the values provided by this technique to the actually observed values.
The performance of a trained network can be measured to some extent by the error on the training, validation and test sets, but it is often useful to investigate the network response in more detail. One option is to perform a regression analysis between the network response and the corresponding targets.
This analysis leads to the get of 3 parameters. The first two, m and b, correspond to the slope and the y-intercept of the best linear regression relating targets to network outputs. If we had a perfect fit (outputs exactly equal to targets), the slope would be 1, and the y-intercept would be 0. The third parameter got is the correlation coefficient (R-value) between the outputs and targets. If this number is equal to 1, then there is perfect correlation between targets and outputs. The task "Perform linear regression analysis" provides us with this information. The next charts show the linear regression analysis for the output variables surface pressure, surface temperature and surface dew point temperature.
In these cases, the outputs track the targets well since the values of the slope are 0.615 for the pressure, 0.782 for the temperature and 0.806 for the dew point temperature. However, as we said before, the rainfall is not well modelled due to its high temporal variability.
For the rainfall, the slope is 0.0399 which shows that the model is slightly better than randomness for this variable but not as good as for the other ones.
Furthermore, there are some other testing techniques specially useful in forecasting problems. The first one is plotting the error autocorrelation chart, which describes how prediction errors are correlated in time. For a perfect prediction, the correlation function takes only one nonzero value and it is at lag zero. The task "Calculate error autocorrelation" shows this chart for every output variable. The next images show the error autocorrelation charts of the rainfall and the temperature variables.
As it is said before, for the first lag the value is not zero. For the rest of the lags, the values are zero on average for the temperature variable while for the rainfall variable bigger than zero in general.
The other task that is widely used in forecasting problems is "Calculate cross-correlation". This task calculates the correlation between the inputs and the error, which is the difference between the targets and the outputs of the neural network. For a perfect prediction the input error cross-correlation should be significant zero for every lag. The next figures show the cross-correlation charts for the pressure and for the dew point temperature variables.
In this work we have attempted to one-day forecast a few meteorological variables at surface by means of neural networks. For that purpose, we have used data collected by a weather station. All the variables but rainfall have been reasonably well modelled, however, better results could be obtained by studying not only the data collected in this weather station but also data from other areas next to it. In the case of the rainfall, it could be useful to try other architectures to achieve better predictions, however, there is the possibility that rainfall cannot be accurately forecasted based on the data available.