Forecast the power generated by solar plants

End-to-end machine learning examples

Solar power is a free and clean alternative to traditional fossil fuels. However, nowadays the efficiency of solar cells is not as high as we would like, so selecting the ideal conditions for its instalation is key in obtaining the maximun amount of energy out of it.

We want to predict the power output for a particular array of solar power generators knowing some environmental conditions.


  1. Application type
  2. Data set
  3. Neural network
  4. Training strategy
  5. Model selection
  6. Testing analysis
  7. Model deployment

1. Application type

This is an approximation project, since the variable to be predicted is continuous (energy production).

The basic goal here is to model the energy production, as a function of the environmental variables.

2. Data set

We use the following inputs for this analysis:

We have eliminated (set as unused in the variables section of the Data set window) the Day, Month and year variables of the original data file because they do offer the same information as the Day of year variable.

Our prediction target will be

Variable "Is Daylight" can be troublesome. In all cases its value is equal to 0 the generated power is also 0 (as expected, solar generators don't work during the night). By performing the task "Filter data" we can eliminate the instances in which "Is Daylight" is 0 by setting the minimun value for "Power Generated" to 1. In the Neural Viewer windows we can see that 1320 instances has been filtered. Then we also select "Is Daylight" as unused.

Our final, working dataset is now composed of 1600 instances out of 2920 observations. Of those, 952 (32.6%) are for training, 327 (11.2%) are for selection and 321 (11%) are for testing.

Now the dataset can be used to make accurate predictions, using 11 inputs and 1 target.

2. Neural network training

The task "Perform training" trains the neural network to create the model. The training parameters can be configured in the Training strategy page. In this case, all the parameters are set to their default values.

The following plot shows the training and selection errors in each iteration.

The initial value of the training error is 14.6567, and the final value after 148 iterations is 0.116421. The initial value of the selection error is 13.2574, and the final value after 148 iterations is 0.0932195.

By default the nuber of Neurons in the Neural Network is set to 3, but maybe there are better options for this case. Perform the task "Perform Order Selection" to find out the optimal configuration for the Neural Network. It will find an optimal neuron number of 6, giving optimal losses of around 0.08.

5. Model selection

6. Testing analysis

We can check the accuracy of the model with the task "Perform linear regression analysis". Here we can see that the predictions fit quite nicely the actual data.

We want values of Slope and Intercept as close as 1 and 0 as possible, that would mean a perfect match between predictions and actual data.

7. Model deployment

If you want the actual expresion used for predicting the new values click on "Write expression".

If you want to predict the output for a new set of inputs you can perform the task "Export output data". The software will ask you for an input file, without the generated power values, and will create a file with the predicted values for those inputs.

One usefull task in this section is "Plot directional output", where we can see the effect of each individual input on the predicted generated power. As an example we take the graph showing Power Generated against Average Wind Speed (Period).

It shows that the generated power increases with the wind speed until about 20 and then decreases quickly. Using this information now we know that we must avoid zones with wind speeds over this value.