Neural networks are the most crucial technique for machine learning and artificial intelligence. Mathematically, we can formulate the modeling process with neural networks from a variational point of view. Indeed, building a model consists of finding a function that causes a loss functional to assume an extreme value.
The following figure depicts a class diagram for the concepts involved in the modeling process.
As we can see, the modeling process involves six concepts: data set, neural network, training strategy, model selection, testing analysis, and model deployment. Next, we introduce these six concepts.
1. Data set
The data set contains information for creating our model. The information may include numerical measurements, text, images, etc. It is a data collection structured as a table in rows and columns.
A data set comprises a matrix and information about the columns or variables and rows or samples. Variables can be used as inputs, targets, or unused. Samples can be used for training, selection, testing, or unused.
The following is an example of a data set in the automotive sector.
Example: Electric motor data set
An automotive company wants to build a digital twin of an electric motor using artificial intelligence. Having robust rotor and stator temperature estimators helps the automotive industry improve the motor’s efficiency by reducing power losses and, ultimately, heat buildup.
The company uses the data set to build the model. The dataset comprises various sensor data collected from a permanent magnet synchronous motor (PMSM) deployed on a test bench. The LEA department of the University of Paderborn collected the testbed measurements. This data set consists of 107 samples.
The following table illustrates the data set.
|ambient temperature||coolant temperature||voltage_d||voltage_q||…||stator_winding|
In this data set, all the variables are numeric. The input variables are $ambient\_temperature$, $coolant\_temperature$, $voltage\_d$, $voltage\_q$, $voltage\_module$, $current\_d$, $current\_q$ and $current\_module$ and the target variables are $motor\_speed$, $torque$, $stator\_yoke$ and $stator\_tooth$, $stator\_winding$.
The samples are divided into 60% training samples (65), 20% selection samples (21), and the remaining 20% testing samples (21).
2. Neural network
An artificial neural network, or simply a neural network, can be defined as a biologically inspired computational algorithm consisting of a network architecture composed of artificial neurons.
This structure contains a set of parameters tuned to perform specific tasks. The neural network represents the model.
Neural networks are organized in layers.
- Approximation models typically contain two layers of $perceptron$.
- Classification models usually contain a $perceptron$ layer and a $probabilistic$ layer.
- Forecasting models usually contain a recurrent layer and a $perceptron$ layer.
- Other models, such as the image classification model, include multiple $convolutional$ and $pulling$ layers and a $perceptron$ or $probabilistic$ layer.
Neural networks have universal approximation properties. This means they can approximate any function in any dimension with a desired degree of accuracy.
The following is an example of a neural network in the automotive sector.
Example: Electric motor neural network
To create its model, the company chooses a neural network. The following figure shows the neural network model.
The neural network consists of five layers. The first is a scaling layer with eight neurons; the following are perceptron layers with three and five neurons, respectively, and the last is a probabilistic layer with five neurons.
As we can see, the inputs to this neural network are $ambient\_temperature$, $coolant$, $voltage\_d$, $voltage\_q$, $current\_d$, $current\_q$, $voltage\_module$ and $current\_module$. The outputs from the neural network are $motor\_speed$, $torque$, $stator\_yoke$, $stator\_tooth$ and $stator\_winding$.
3. Training strategy
The training strategy aims to fit the data set to the neural network. The training strategy comprises the loss index and the optimization algorithm.
The loss index defines the task the neural network is required to do and provides a measure of the quality of the representation that the model is necessary to learn. The choice of a suitable error term depends on the particular application. We can state the learning problem to minimize the loss index.
The loss index for a neural network is composed of terms. The more important are the error term and the regularisation term.
The error term measures the difference between the outputs of the neural network and the correct predictions.
We can use several types of errors. The most common error functions are $Mean\ Squared\ Error$, $Normalised\ Squared\ Error$, $Minkowski\ Error$, $Cross-Entropy\ Error$ in classification problems or the $Squared\ Weighted\ Error$ in binary classification problems.
The regularisation term can be applied to obtain a good generalization. Adding a regularisation term to the error term will decrease the values of the biases and the neural network’s weights. In consequence, the outputs of the neural network will become smoother, avoiding overfitting. One of the most used regularisation methods is the norm of the neural network parameters.
The primary purpose of the loss index is to avoid overfitting and improve regularisation. Among the most commonly used optimisation algorithms are $Gradient\ Descent$, $Conjugate\ Gradient$, $Quasi-Newton\ Method$, $Levenberg-Marquardt$, $Stochastic\ Gradient\ Descent$ and $Adaptative\ Moment\ Esrimation$.
The following is an example of a training strategy in the automotive sector.
Example: Electric motor training strategy
The automotive company creates a model to improve the efficiency of its engines. The loss index chosen is the normalized squared error with L2 regularisation. This loss index is the default in approximation applications. The optimization algorithm chosen is the quasi-Newton method. Once the strategy has been set, we can train the neural network.
The following figure shows how the training (blue) and selection (orange) errors decrease with the training epoch during the training process.
The chart shows that both errors decrease until reaching a stationary value, so the algorithm converges. The most critical training result is the final selection error. It is a measure of the generalization ability of the neural network. The final selection error and training error is $selection\ error = 0.083 NSE$ and $training\ error = 0.029 NSE$.
4. Model selection
As we said before, building a model’s objective is not to memorize the training subset but to show a good generalization capacity. The optimal architecture is the one that shows the best generalization capacity. That is the one for which the selection error is the lowest. We can analyze which input variables are redundant and deleted from the neural network, called inputs selection. It can be studied for which number of neurons the neural network shows the best performance, called neuron selection.
When designing the neural network architecture, two common problems can occur: underfitting and overfitting.
Underfitting is the effect that appears when the model is too simple. In this case, the neural network can fit neither the training data nor the selection data. Overfitting is the opposite effect. It occurs when the neural network is too complex. Consequently, during the training process, the error for the training samples will decrease while the error for the selected samples increases.
In both situations, the result is a model of bad quality.
The following is an example of a model selection in the automotive sector.
Example: Electric motor model selection
To achieve the model’s optimal architecture, the company studies which input variables are redundant and with which number of neurons the neural network shows the best performance. This reduces the selection error in its model. They use the growing neuron algorithm to achieve the optimal number of neurons.
The following figure shows the final neural network model.
Therefore, the number of neurons in the perceptron layer has increased from 3 to 9, and the selection error has changed from $0.083NSE$ to $0.043NSE$.
5. Testing analysis
Once the optimization algorithm has trained the model, we must evaluate its predictive ability on new data previously seen in the neural network.
We use the test subset, which contains a set of new cases with their corresponding inputs and target variables.
The goal of testing is to compare the responses of the trained neural network with the correct predictions for each sample in the test subset.
We can use the results of this process as a simulation of what would happen in a real-world situation.
One of the simplest methods to study the neural network’s performance is calculating the error for the testing subset.
If the model has not over-fitted the training or selection instances, the training, selection, and testing errors should be similar.
The most common method for testing regression models is the goodness of fit analysis. In the case of classification, the confusion matrix, the binary classification tests, or the ROC curve are quite common testing methods. There are also specific methods for testing forecasting models. Some of them are error autocorrelations and inputs-error cross-correlations.
If we consider the neural network good quality, we can move it to the deployment phase.
The following is an example of a testing analysis in the automotive sector.
Example: Electric motor testing analysis
The automotive company needs to test the model to check how well it fits a set of observations. To do this, they calculate the goodness of fit of a statistical model and the coefficient of determination, R2.
The total number of test samples is 21.
The model’s goodness-of-fit measures summarise the discrepancy between observed and expected values. The R2 coefficient quantifies the proportion of variation of the predicted variable concerning the actual values. If we had a perfect fit (results equal to the objectives), R2 would equal 1.
The following figure illustrates the predicted values versus the actual ones for the output $stator\_yoke$.
The chart shows that the predicted values resemble the actual values.
To give a quality measure, we calculate the coefficient of determination, R22.
Indeed, the regularisation coefficient R2 is close to 1.
6. Model deployment
Deployment in machine learning refers to applying a model to predict new data.
The deployment of a model consists of making it available to end-users. There are many ways to deploy a machine learning model.
The form of deployment depends on the requirements. Sometimes, the end-user wants a report with the results.
On other occasions, they might need a repeatable continuous learning process.
The following is an example of a model deployment in the automotive sector.
Example: Electric motor model deployment
The mathematical function describes the operation of the motor from the input data.
The mathematical expression represented by the neural network is written below.
scaled_ambient temperature = (ambient temperature+0.6031910181)/0.98526299; scaled_coolant temperature = (coolant temperature+0.3932940066)/1.030290008; scaled_voltage_d = (voltage_d+0.3587549925)/0.799169004; scaled_voltage_q = (voltage_q+0.2354030013)/0.9717490077; scaled_voltage_module = (voltage_module-1.255239964)/0.4234420061; scaled_current_d = (current_d-0.08343230188)/1.120489955; scaled_current_q = (current_q-0.2310259938)/0.6012690067; scaled_current_module = (current_module-1.189710021)/0.4886389971; perceptron_layer_1_output_0 = tanh( -0.293822 + (scaled_ambient temperature*0.017537) + (scaled_coolant temperature*0.277449) + (scaled_voltage_d*-0.147449) + (scaled_voltage_q*0.0689801) + (scaled_voltage_module*-0.0169951) + (scaled_current_d*-0.267293) + (scaled_current_q*-0.385712) + (scaled_current_module*0.0363538) ); perceptron_layer_1_output_1 = tanh( -0.00602507 + (scaled_ambient temperature*-0.0122427) + (scaled_coolant temperature*-0.447815) + (scaled_voltage_d*-0.036908) + (scaled_voltage_q*0.00900047) + (scaled_voltage_module*-0.0253258) + (scaled_current_d*0.27237) + (scaled_current_q*-0.163464) + (scaled_current_module*-0.115275) ); perceptron_layer_1_output_2 = tanh( 0.224242 + (scaled_ambient temperature*-0.00884039) + (scaled_coolant temperature*-0.210512) + (scaled_voltage_d*-0.0931465) + (scaled_voltage_q*0.0881369) + (scaled_voltage_module*0.0192406) + (scaled_current_d*-0.0520755) + (scaled_current_q*0.185785) + (scaled_current_module*-0.0117133) ); perceptron_layer_2_output_0 = ( 0.141026 + (perceptron_layer_1_output_0*2.54465) + (perceptron_layer_1_output_1*0.551241) + (perceptron_layer_1_output_2*2.39387) ); perceptron_layer_2_output_1 = ( -0.723429 + (perceptron_layer_1_output_0*-1.79491) + (perceptron_layer_1_output_1*-1.76963) + (perceptron_layer_1_output_2*1.484) ); perceptron_layer_2_output_2 = ( 0.231167 + (perceptron_layer_1_output_0*0.653582) + (perceptron_layer_1_output_1*-1.86523) + (perceptron_layer_1_output_2*-0.536217) ); perceptron_layer_2_output_3 = ( 0.155599 + (perceptron_layer_1_output_0*1.09885) + (perceptron_layer_1_output_1*-1.7018) + (perceptron_layer_1_output_2*0.494817) ); perceptron_layer_2_output_4 = ( 0.0824158 + (perceptron_layer_1_output_0*1.09379) + (perceptron_layer_1_output_1*-1.57303) + (perceptron_layer_1_output_2*1.0116) ); unscaling_layer_output_0=perceptron_layer_2_output_0*1.132040024-0.06418219954; unscaling_layer_output_1=perceptron_layer_2_output_1*0.604493022+0.2313710004; unscaling_layer_output_2=perceptron_layer_2_output_2*0.9934260249-0.4365360141; unscaling_layer_output_3=perceptron_layer_2_output_3*1.083299994-0.401120007; unscaling_layer_output_4=perceptron_layer_2_output_4*1.152959943-0.3688929975;
The mathematical function is the final result of the study. The company can use it to achieve its objectives: to improve the efficiency of its car engines.
You can watch the video tutorial to help you complete this article.
- Kaggle Machine Learning Repository. Electric Motor Temperature Data Set.