Predict the generation of a solar plant using machine learning

Energy

Roberto López
https://es.linkedin.com/in/robertolopezartelnics
August 31, 2023

In this example, we develop a machine learning model to predict power generation at a solar plant located in Berkeley, CA.

We utilize various environmental conditions, including temperature, humidity, wind speed, and others.

Solar power is a free and clean alternative to traditional fossil fuels. However, the efficiency of solar cells is not as high as it could be nowadays.

Therefore, selecting the ideal conditions for its installation is critical to maximizing energy output. Knowing certain environmental conditions, we aim to predict the power output for a specific array of solar power generators.

This example is solved with Neural Designer. You can use the free trial to understand how the solution is achieved step by step.

Application type.
Data set.
Neural network.
Training strategy.
Model selection.
Testing analysis.
Model deployment.

1. Application type

The variable to be predicted is continuous (energy production). Therefore, this is an approximation project. The primary goal is to model energy production as a function of environmental variables.

2. Data set

The first step is to prepare the dataset, which is the source of information for the approximation problem.

We have downloaded the raw data from Kaggle.

To achieve optimal results, we have processed this dataset.

The file solarpowergeneration.csv contains the data for this example. Here, the number of variables (columns) is 10, and the number of instances (rows) is 2920.

Variables

We have the following variables for this analysis:

Input variables

distance_to_solar_noon – Distance to solar noon (radians).
temperature – Daily average temperature (°C).
wind_direction – Daily average wind direction (°; 0–360).
wind_speed – Daily average wind speed (m/s).
sky_cover – Cloud cover (0 = clear, 4 = fully covered).
visibility – Visibility (km).
humidity – Relative humidity (%).
average_wind_speed – 3-hour average wind speed (m/s).
average_pressure – 3-hour average barometric pressure (inHg).

Target variables

power_generated – Power generated in each 3-hour period (J).

Instances

The instances are randomly split into 60% training, 20% validation, and 20% testing.

Distributions

Calculating the data distributions helps us verify the accuracy of the available information and detect anomalies.

The following chart shows the histogram for the power-generated variable:

Input-target correlations

It is also interesting to look for dependencies between a single input and a single target variable.

To do that, we can plot an input-target correlations chart.

In this case, the highest correlation is with the distance to solar noon (the solar plant generates the closer to solar noon, the more power).

Scatter charts

Next, we plot a scatter chart for the most significant correlations for our target variable.

3. Neural network

The second step is to build a neural network that represents the approximation function. Approximation models usually contain the following layers:

Scaling layer.
Perceptron layers.
Unscaling layer.

The neural network has nine inputs (distance to solar noon, temperature, wind direction, wind speed, sky cover, visibility, humidity, average wind speed (period), and average pressure (period)) and one output (power generated).

Scaling layer

The scaling layer contains the statistics of the inputs. We use the automatic setting for this layer to accommodate the best scaling technique for our data.

Desnse layers

We use 2 perceptron layers here.

The hidden perceptron layer has 9 inputs, 3 neurons, and a hyperbolic tangent activation function.

The output perceptron layer has 3 inputs, 1 neuron, and a linear activation function.

Unscaling layer

The unscaling layer contains the statistics of the outputs. We use the automatic method as before.

Neural network graph

The following graph represents the neural network for this example.

4. Training strategy

The fourth step is to select an appropriate training strategy.

It is composed of two concepts:

Loss index.
Optimization algorithm.

Loss index

The loss index defines what the neural network will learn. It is composed of an error term and a regularization term.

The error term we choose is the normalized squared error. It divides the squared error between the neural network outputs and the data set’s targets by its normalization coefficient. If the normalized squared error is 1, the neural network predicts the data ‘in the mean’, while a value of 0 means a perfect data prediction. This error term has no parameters to set.

The regularization term is the L2 regularization. It controls the neural network’s complexity by reducing the values of the parameters. We use a weak weight for this regularization term.

Optimization algorithm

The optimization algorithm is responsible for searching for the neural network parameters that minimize the loss index.

Here, we chose the quasi-Newton method as an optimization algorithm.

Training

The following chart shows how the training error (blue) and the selection error (orange) decrease with the number of epochs during training.

The final values are training error = 0.121 NSE and selection error = 0.122 NSE, respectively.

5. Model selection

The objective of model selection is to find the network architecture with the best generalization properties.

We aim to reduce the final selection error obtained previously (0.122 NSE).

Neuron selection

The best selection error is achieved using a model with the most appropriate complexity to produce a good data fit.

Order selection algorithms are responsible for finding the optimal number of perceptrons in the neural network.

The following chart shows the results of the incremental order algorithm.

The blue line shows training error and the orange line shows selection error, both versus the number of neurons.

As we can see, the final training error continuously decreases with the number of neurons. However, the final selection error takes a minimum value at some point.

The optimal number of neurons is 8, corresponding to a selection error of 0.089 NSE.

Neural network graph

The following figure shows the optimal network architecture for this application.

6. Testing analysis

The purpose of the testing analysis is to validate the generalization capabilities of the neural network.

We use the testing instances in the data set, which have never been used before.

Goodnes-of-fit

A standard testing method in approximation applications is to perform a linear regression analysis between the predicted and the real values.

For a perfect fit, the correlation coefficient R2 would be 1. With R2 = 0.951, the neural network accurately predicts the testing data.

7. Model deployment

In the model deployment phase, the neural network predicts outputs for inputs it has not seen before.

Neural network outputs

We can calculate the neural network outputs for a given set of inputs:

distance_to_solar_noon: 0.503 radians.
temperature: 58.468ºC.
wind_direction: 24.953º.
wind_speed: 10.097 m/s.
sky_cover: 1.988 over 4.
visibility: 9.558 km.
humidity: 73.524%.
average-wind-speed-(period): 10.136 m/s.
average_pressure_(period): 30.062 mercury inches.

The predicted output for these input values is the following:

power_generated: 3012.461 Jules per 3-hour period.

Directional outputs

Directional outputs plot the neural network outputs through some reference points.

The next list shows the reference point for the plots.

distance_to_solar_noon: 0.503 radians.
temperature: 58.468ºC.
wind_direction: 24.953º.
wind_speed: 10.097 m/s.
sky_cover: 1.988 over 4.
visibility: 9.558 km.
humidity: 73.524%.
average_wind_speed: 10.136 m/s.
average_pressure: 30.062 mercury inches.

We can see here how the distance to solar noon affects the power generated:

8. Video tutorial

You can watch the step-by-step tutorial video below to help you complete this machine learning example for free using the software Neural Designer.

References

Ph.D. candidate Alexandra Constantin compiled the data.

Predict the generation of a solar plant using machine learning

Contents

1. Application type

2. Data set

Variables

Input variables

Target variables

Instances

Distributions

Input-target correlations

Scatter charts

3. Neural network

Scaling layer

Desnse layers

Unscaling layer

Neural network graph

4. Training strategy

Loss index

Optimization algorithm

Training

5. Model selection

Neuron selection

Neural network graph

6. Testing analysis

Goodnes-of-fit

7. Model deployment

Neural network outputs

Directional outputs

8. Video tutorial

References

Related posts