Neural Designer is operated by introducing settings in the data book and running tasks from the task manager. Each component has some settings in the data book and some tasks in the task manager. This tutorial describes every task that can be run within Neural Designer.
The data set contains the information for creating the model. It comprises a data matrix in which columns represent variables and rows represent instances. Variables in a data set can be of three types: The inputs will be the independent variables in the model; the targets will be the dependent variables in the model; the unused variables will neither be used as inputs nor as targets. On the other hand, instances can be: Training instances, which are used to construct the model; selection instances, which are used for selecting the optimal order of the model; testing instances, which are used to validate the functioning of the model; unused instances, which are not used at all.
The "Report data set" task writes to the viewer a preview of the data table and information about the variables and the instances, see the next figure.
Basic statistics are a very valuable information when designing a model, since they might alert on the presence of spurious data. It is a must to check for the correctness of the most important statistical measures of every single variable.
On output, this task writes a table with the minimum, maximum, mean and standard deviation of all variables in the data set. The following figure illustrates that.
Histograms show how the data is distributed over its entire range. A uniform distribution for all the variables is, in general, desirable. If the data is very irregularly distributed, then the model will probably be of bad quality.
The "Calculate data histograms" task opens a dialog asking for the number of bins to be used. The next figure shows that dialog.
On output, a histogram for every variable is plotted. The next figure illustrates the histogram of a single variable.
Box plots display information about the minimum, maximum, first quartile, second quartile or median and third quartile of every variable in the data set. The length of the box represents the interquartile range (IQR), which is the distance between the third quartile and the first quartile. The middle half of the data falls inside the interquartile range. The whisker below the box shows the minimum of the variable while the whisker above the box shows the maximum of the variable. Within the box, it will also be drawn a line which represents the median of the variable.
Box plots also provide information about the shape of the data. If most of the data are concentrated between the median and the maximum, the distribution is skewed right, if most of the data are concentrated between the median and the minimum, it is said that the distribution is skewed left and if there is the same number of values at the both sides of the median, the distribution is said to be symmetric.
Constant variables are those columns in the data matrix having always the same value. They do not provide any information to the model but increase its complexity. Constant variables should neither be used as inputs nor as targets, except when the model will need to include them in the future.
The "Unuse constant variables" task does two things: (i) In Neural Editor, sets the variables which are repeated to "Unused". (ii) In Neural Viewer, writes a text where indicates what are the repeated instances. The next text illustrates that with an example with one unused variable.
This task calculates the absolute values of the linear correlations among all inputs. The correlation is a numerical value between 0 and 1 that expresses the strength of the relationship between two variables. When it is close to 1 it indicates a strong relationship, and a value close to 0 indicates that there is no relationship.
This task is only defined for approximation applications.
It might be interesting to look for linear dependencies between single input and single target variables. This task calculates the correlation coefficients between all inputs and all targets and displays a table with all that values. Correlations close to 1 mean that a target increases linearly with an input. Correlations close to -1 mean that the target decreases when the input increases in a linear fashion. Correlations close to 0 mean that there is not a linear relationship between an input and a target variables. Note that, in general, the targets depend on many inputs simultaneously.
The following table illustrates the linear correlation table for a simple case with just one input and one target. In this case, the coefficient is -0.64, which indicates that there is some linear relationship between that two variables.
This task is only defined for classification applications.
For pattern recognition problems, we can look for logistic dependencies between single input and single target variables. The logistic correlation is a numerical value between -1 and +1 that expresses the strength of the relationship between a single input and output variables. When it is close to 1 it indicates a strong logistic relationship. Values close to 0 indicate that there is no logistic relationship. If the logistic correlation is close to -1 indicates a strong logistic relationship. Values gre 1 indicate that there is no relationship. Note that, in general, relationships are multivariable and non-logistic.
The "Calculate logistic errors" task calculates the best logistic function between all single inputs and all single outputs. The results are displayed in a table similar to the following:
This task is only defined for classification applications.
The number of instances of each class in the data set must be balanced. Note that if the number of targets is one then the number of classes is two, and that if the number of targets is greater than one then the number of classes is equal to the number of targets.
On output, a chart showing the number of instances belonging to each class in the data set is plotted, see the following figure.
This task plots graphs of all targets versus all inputs. That charts might help to see the dependencies of the targets with the inputs.
Repeated instances are those rows in the data matrix having the same values than other rows. They provide redundant information to the model and should not be used for training, selection or testing.
The "Unuse repeated instances" task does two things: (i) In Neural Editor, sets the use of that instances which are repeated to "Unused". (ii) In Neural Viewer, writes a table with the indices of all the repeated instances. The following table illustrates the repeated instances table from the viewer.
This task balances the distribution of targets in a data set. In the case of pattern recognition problems, it equals the number of instances of every target class by unusing those instances whose variables belongs to the most populated bins.
On output the "Balance targets distribution" task for pattern recognition problems does two things:
(i) It shows a table with the instances that have been unused arranged in rows of 10.
(ii)It shows a chart showing the number of instances belonging to each class in the data set.
The next figure shows and example of (i).
In the rest of cases, it unuses a given percentage of the instances whose values belong to the most populated bins. It opens a dialog for choosing the percentage of instances to balance. The next figure shows that dialog:
On output the "Balance targets distribution" task for function regression problems does two things:
(i) It shows a table with the instances that have been unused arranged in rows of ten.
(ii) It plots the new histograms for every variable.
The next picture shows an example of (i).
After performing this task, the distribution of the data will be more uniform and, in consquence, the model will probably be of better quality.
When designing a model, the general practice is to first divide the data into three subsets. The first subset is the training set, which is used for constructing different candidate models. The second subset is the selection set, which is used to select that model exhibiting the best properties. The third subset is the testing set, which it is used for validating the final model.
This tasks opens a dialog for choosing the method for splitting the instances (sequential or random) and the training, selection and testing ratios. The next figure shows the instances splitting dialog.
On output the "Split instances" task does two things:
(i)It sets the use of all instances in the data set page of Neural Editor witht the calculated values.
(ii)It writes a table in Neural Viewer with the new uses.
This task unuses the instances that don't fall in the range that you specify. When running this task, a dialog for setting the minimum and maximum values of all the variables will appear.
The instances that fall outside that range will be set to "Unused" in the project. The Viewer will also inform you about that change.
Outliers are defined as observations in the data that are abnormally distant from the others. They may be due to variability in the measurement or may indicate experimental errors.
This task uses the Tukey's method, which defines an outlier as those values of the data set that fall to far from the central point, the median. The maximum distance to the center of the data that is going to be allowed is defined by the cleaning parameter. As it grows, the test becomes less sensitive to outliers but if it is too small, a lot of values will be detected as outliers.
As output, it shows a table with the number of instances set as unused for each variable.
This task is only defined for approximation applications.
Other method to clean outliers is to find those instances which targets are far away from its expected value. In the next figure, we can see several data that have different target of the predicted value, these cases are considered as outliers.
This task trains a neural network and then it finds the cases with a high percentage error and sets them as unused.
This task is only defined for forecasting applications.
This task plots the time series for the different columns in the data set. It displays in the y-axis observations against time in the x-axis. It can be used as a visual aid in order to discover some patterns in the used data and to study its behaviour along time.
The next figure is an example of the output of this task.
This task is only defined for forecasting applications.
Autocorrelations plot or correlogram is a plot of serial correlations versus different lags. In time series prediction problems, autocorrelation refers to the correlation of a time series with its own past and future values. If we have a random series, the serial correlations should be close to zero in value on average for all the lags values.
This task opens a dialog in which the maximum number of lags for which autocorrelations will be calculated can be selected. The dialog is shown in the next figure.After that, Neural Viewer will show the correlogram. The next figure depicts an example of it.
Principal components analysis is a stattistical technique that allows to identify underlying patterns in a data set so it can be expressed in terms of other data set of lower dimension without much loss of information. The resulting data set should be able to explain most of the variance of the original data set by making a variable reduction. The final variables will be named principal components. Since this process is not reversible, it will be only applied to the input variables.
After performing this task the Neural Viewer will show on output a table and a chart containing the relative variance and the cumulative variance explained by each of the principal components. The next image shows and example of the chart. The x-axis represent the principal components and the y-axis the cumulative explained variance. Note that the number of principal components that can be calculated is equal to the number of inputs of the original data set.
This task will also activate the principal components layer widget in the neural network page of Neural Editor. It will automatically select "Apply principal components" and all the principal components will be used by default. The number of principal components to be used can be modified by changing the value of the percentage of cumulative explained variance that we want the transformed data set to have. As the value of this value decreases, so it will do the number of principal components. In the case that the principal components layer is not wanted to be used, it is needed to deselect the check box of "Apply principal components". The next image shows an example the view of principal components layer after performing the task.
The neural network defines a function which represents the model. The neural network implemented in Neural Designer is a class of universal approximator. It is used to represent the predictive model.
On output, this task writes to the report information about the inputs, the scaling layer, the learning layers, the unscaling/probabilistic layer and the outputs. The following figure illustrates how the neural network report appears on the viewer.
The norm of the parameters gives a clue about the complexity of the model. If the parameters norm is small, the model will be smooth. On the other side, it the parameters norm is very big, the model might become unstable. In any case, always note that the norm depends on the number of parameters.
The "Calculate parameters norm" task writes a table to the viewer with the actual norm of the neural network parameters vector. The next figure illustrates that table.
The statistics on the parameters depict information about the complexity of the model. In general, it is desirable that all the minimum, maximum, mean and standard deviation values are not very big.
This task writes a table in Neural Viewer with the basic statistics on the neural parameters, see the following figure.
The histogram of the parameters show how they are distributed. A regular distribution for the parameters is, in general, desirable. If the parameters are very irregularly distributed, then the model is probably unstable.
This task launchs a dialog asking for the number of bins required for the histogram, see the next figure.
On output, it draws the distribution of the neural parameters through a histogram. The following figure illustrates that.
The histogram of the outputs shows how they are distributed. This method takes 1000 random instances and calculate the histogram with its outputs.
The loss index plays an important role in the use of a neural network. It defines the task the neural network is required to do, and provides a measure of the quality of the representation that it is required to learn. The choice of a suitable loss index depends on the particular application.
This task writes to the viewer the error and regularization terms to be used in the loss expression, together with their parameters. The next figure illustrates that.
The loss is the sum of the error and regularization terms. Note that smaller values here mean better performance.
On output, a table with the error, regularization and total loss values is written. The next figure is an example of the output from this task.
The procedure used to carry out the learning process is called training (or learning) strategy. The training strategy is applied to the neural network in order to obtain the best possible performance. General training strategies in Neural Designer are composed of two different algorithms: initialization training algorithm and main training algorithm. Initialization algorithms are used to obtain good sets of initial parameters, in order to facilitate the convergence of more efficient main algorithms.
On output, the "Report training strategy" task writes to the viewer which initialization and main algorithms compose the training strategy. It also writes their training parameters, stopping criteria, etc. The next figure illustrates the results from this task in Neural Viewer.
The procedure used to carry out the learning process is called training (or learning) strategy. The training strategy is applied to the neural network to in order to obtain the best possible performance. The type of training is determined by the way in which the adjustment of the parameters in the neural network takes place. A general strategy consists on applying two different training algorithms: (i) an initialization training algorithm and (ii) a main training algorithm.
The "Perform training" task is one of the most important. It trains a neural network, and updates the new parameters in Neural Editor.
On the other hand, it plots different training history charts, which are shown in Neural Viewer. The next figure illustrates the performance history plot.
The "Perform training" task also writes a table with some final neural network, loss measure and training strategy values, see the following figure.
The model selection is applied to find a neural network with a topology that minimizes the error for new data. General model selections are composed of two different classes of algorithms: an order selection algorithm and an inputs selection algorithm. Order selection algorithms are used to get the optimal number of hidden perceptron in the neural network. Inputs selection algorithms are responsible for finding the optimal subset of inputs.
On output, the "Report model selection" task writes to the viewer the information concerned with the order and inputs selection algorithms composing the model selection. It also writes their parameters, stopping criteria, etc. The next figure illustrates the results from this task in Neural Viewer.
This task trains the neural network and calculates the selection error of the neural network removing one by one each input. This errors show which variable have more influence in the output. An input can improve the performance removing it or it can produce a worse model if it is removed.
On output, it shows a chart with the percentage of contribution of each input.
Some data sets have inputs that are redundants and it affects the loss of the neural network. The inputs selection are used to find the optimal subset of inputs for the best loss of the model.
This task modify the inputs of the neural network to obtain the optimum selection loss.
On output, it plots different losses history charts, and statistics if the genetic algorithm is the selected algorithm, which are shown in Neural Viewer. The next figure illustrates the losses history plot.
The "Perform inputs selection" task also writes a table with some final loss measure and order algorithm values, see the following figure. Finally, it shows the final architecture of the neural network.
The best selection is achieved by using a model whose complexity is the most appropriate to produce an adequate fit of the data. The order selection is responsible of finding the optimal number of the hidden perceptrons number.
This task modify the order of the neural network to obtain the optimum selection loss.
On the other hand, it plots different losses history charts, which are shown in Neural Viewer. The next figure illustrates the losses history plot.
The "Perform order selection" task also writes a table with some final loss measure and order algorithm values, see the following figure. Finally, it shows the final architecture of the neural network.
For a binary pattern recognition problem the threshold of the probability layer can determine the accuracy of the final model. This task modify the threshold in order to optimize some value that determine the precision of the model. The threshold selection does not train the neural network, so a previous training must be performed.
On output, it plots a graph with the values of the function to optimize for each threshold evaluated. It also shows a table with the optimal values of the threshold and error.
This task is only defined for approximation applications.
A standard method to test the performance of a model is to perform a linear regression analysis between the scaled neural network outpus and the corresponding targets for an independent testing subset. This analysis leads to 3 parameters for each output variable. The first two parameters, a and b, correspond to the y-intercept and the slope of the best linear regression relating scaled outputs and targets. The third parameter, R2, is the correlation coefficient between the scaled outputs and the targets. If we had a perfect fit (outputs exactly equal to targets), the slope would be 1, and the y-intercept would be 0. If the correlation coefficient is equal to 1, then there is perfect correlation between the outputs from the neural network and the targets in the testing subset.
This taks performs a linear regression analysis between the testing instances in the data set and the corresponding neural network outputs. It writes to the viewer all the provided parameters. It also draws a plot of the linear regression analysis for each output variable, as follows.
This task is only defined for approximation applications.
The error data statistics measure the minimums, maximums, means and standard deviations of the errors between the neural network and the testing instances in the data set. They provide a valuable tool for testing the quality of a model.
On output, it writes a table with the basic statistics on the absolute, relative and percentage error data, as in the following figure.
This task is only defined for functrion regression applications.
The error data histograms show how the errors from the neural network on the testing instances are distributed. In general, a normal distribution for each output variable is expected here.
The next figure illustrates the error data histogram for one output variable. Here the number of bins is 10.
It is very useful to see which testing instances provide the maximum errors, in order to alert of deficiencies in the model. The following table illustrates the output from this task.
This task is only defined for classification applications.
In the confusion matrix the rows represent the target classes and the columns the output classes for a testing target data set. The diagonal cells in each table show the number of cases that were correctly classified, and the off-diagonal cells show the misclassified cases.
The next figure is an example of a confusion matrix for a binary classification test.
This task is only defined for classification applications.
The classification accuracy, error rate, sensitivity, specifity, precision, positive likelihood negative likelihood,
F1 score, false positive rate, false discovery rate, false negative rate, negative predictive value, Matthews correlation,
informedness and markedness are parameters for testing the performance of a pattern recognition problem with two classes.
The classification accuracy is the ratio of instances correctly classified.
The error rate is the ratio of instances misclassified.
The sensitivity, or true positive rate, is the proportion of actual positive which are predicted positive.
The specifity, or true negative rate, is the proportion of actual negative which are predicted negative.
The precision is the portion of predicted positive that ar actual positive.
The positive likelihood is the likelihood that a predicted positive is an actual positive.
The negative likelihood is the likelihood that a predicted negative is an actual negative.
The F1 score is the harmonic mean of precision and sensitivity.
The false positive rate is the portion of actual negative that are predicted positive.
The false discovery rate is the portion of predicted negative which are actual negative.
The false negative rate is the portion of actual positive which are predicted negative.
The negative predictive value is the portion of predicted negative which are actual negative.
The Matthews correlation is a correlation between the targets and the outputs.
The Youden's index is the probability that the prediction method will make a correct decision as opposed to guessing.
The markedness is the probability that a condition is marked by the predictor.
On output, the binary classification tests are writen as a table in Neural Viewer, as it is shown in the next figure.
This task is only defined for classification applications.
This method provides a graphical illustration of how well the classifier discriminates between the two different classes. This capacity of discrimination is measured by calculating area under curve (AUC). The closer to 1 AUC, the better the classifier.
On output this task does two things:
(i) It plots a ROC curve, as the next figure depicts:
This task is only defined for classification applications.
This task shows the advantage of using a predictive response model in opposite to randomness. It consists on a cumulative gain curve and a baseline. Baseline represents the results that would be obtained without using a model. The greater the separation between both curves, the better the model.
On output this task plots a cumulative gain curve, as it is shown in the next figure:
This task is only defined for classification applications.
This task provides a visual aid to evaluate a predictive model performance. It consists on a lift curve and a baseline. Lift curve represents the ratio between the positive events using a model and without using it. Baseline represents randomness.
On output this task plots a lift chart. The next image depicts an example of it:
This task is only defined for classification applications.
Conversion rates measure the percentage of cases that perform a desired action. This value can be optimized by acting directly on the client or by a better choose of the potential consumer.
The first pair of columns represent the rates of the data set. The second pair represents the ratios for the predicted positives of the model. The last columns shows the rates for the predicted negatives of the model.
This task is only defined for classification applications.
A classifier is said to be well calibrated when the portion of positive events that it predicts is equal to the portion of positive events that actually occurs. This shows how well calibrated a classifier is by plotting a calibration plot.
On output, this task shows a calibration plot:
This task is only defined for classification applications.
It is useful to know in pattern recognition problems which instances are misclassified in order to find any deficience in the prediction model. In the case of binary pattern recognition problems, this task outputs a table showing which instances, being positive, are predicted as negative and another table showing which instances, being negative, are predicted as positive. In the case of multiple pattern recognition problems, this task outputs several tables showing which instances, belonging to some class, are predicted as belonging to some other class.
The next figure shows an example of that output:
This task is only defined for forecasting applications.
The error autocorrelation function describes how prediction errors are correlated in time. A perfect prediction would mean that the correlation function takes only one nonzero values and it should be at lag zero. The error autocorrelation is calculate for different values of lags and it is shown in a chart where the x-axis represent the lags and the y-axis represent the corresponding autocorrelation value.
The next figure shows an example of that output:
This task is only defined for forecasting applications.
This task calculates the correlation between the inputs and the error, which is the difference between the targets and the outputs of the neural network. For a perfect prediction the input error cross-correlation should be significantly zero for every lag. In the chart shown by this task, the x-axis represent the lags and the y-axis represent the corresponding cross-correlation value.
The next figure shows an example of that output:
A neural network produces a set of outputs for each set of inputs applied. The outputs depend, in turn, on the values of the parameters.
The next figure shows the inputs dialog opened by the "Calculate outputs" task.
The input and output values are writen to the viewer. The following figure shows a table with the output value corresponding to the input values from above.
It is very useful to see how the outputs vary as a function of a single input, when all the others are fixed. This can be seen as the cut of the neural network model along some input direction and through some reference point.
The directional inputs dialog asks for that parameters which are necessary to compute the directional output data from the neural network in some direction:
The next figure shows the directional inputs dialog.
The next plot shows how a single output varies as a function of a single input, with all the other inputs being fixed.
The Jacobian matrix computes the partial derivatives of the outputs from the last layer with respect to the inputs to the first layer. That is, it computes the inputs-outputs partial derivatives of the neural network.
In order to compute the Jacobian, we need the vector of inputs where the derivatives are to be computed. In the next figure the input dialog has 2 inputs.
On output, a table with the Jacobian elements is written. The next figure shows a Jacobian matrix with two inputs and one output.
Any neural network represents a function of the outputs with respect to the inputs. That function also depends on the parameters. The mathematical expression represented by the neural network can be used to embed it into another software, in the so called production mode.
This task writes the mathematical expression represented by the neural network to the viewer. The next listing illustrates that with an example of a neural network with one input and one output.
scaled_x=2*(x-0)/(1-0)-1; y_1_1=tanh(0.848028-0.636581*scaled_x); y_1_2=tanh(-1.31434-0.70441*scaled_x); scaled_y=(-1.21326+0.419568*y_1_1+1.4419*y_1_2); (y) = (0.5*(scaled_y+1.0)*(0.908-0.129)+0.129);
Usually the input data of the values to be predicted are stored as rows in a data file.
This task takes a data file as inputs and writes a csv file with the outputs of the model in columns.
The mathematical expression represented by the model can be exported to different programming languages. This task export this expression to a file in python or R programming languages.
The next code is an example of a file in R programming language.
expression <- function(x) { scaled_x<-2*(x-0)/(1-0)-1 y_1_1<-tanh(0.848028-0.636581*scaled_x) y_1_2<-tanh(-1.31434-0.70441*scaled_x) scaled_y<-(-1.21326+0.419568*y_1_1+1.4419*y_1_2) outputs <- c(0.5*(scaled_y+1.0)*(0.908-0.129)+0.129) outputs }
PMML is a XML-based language for predictive models. This is a standard to describe and exchange predictive models produced by data mining and machine learning algorithms.
This task creates a PMML file to import the model to other systems.