The objective of model selection is to find the network architecture with best generalization properties, that is, that which minimizes the error on the selection instances of the data set.

There are two types of model selection algorithms, order selection and inputs selection algorithms.

Two frequent problems in the design of a neural network are called underfitting and overfitting. The best generalization is achieved by using a model whose complexity is the most appropriate to produce an adequate fit of the data.

To illustrate underfitting and overfitting, consider the following data set. It consists of data taken from a sine function to which white noise has been added.

The best generalization is achieved by using a model whose complexity is the most appropriate to produce an adequate fit of the data. In this case, we use a neural network with 1 input (x), 3 hidden neurons and 1 output (y).

In this way, underfitting is defined as the effect of a selection error increasing due to a too simple model. Here we have used 1 hidden neuron.

On the contrary, overfitting is defined as the effect of a selection error increasing due to a too complex model. In this case, we have used 10 hidden neurons.

The error of a neural network on the training instances of the data set is called the training error, Similarly, the error on the selection instances is called the selection error.

The training error measures the ability of the neural network to fit the data that it sees. But the selection error measures the ability of the neural network to generalize to new data.

The next figure shows the training (blue) and selection (orange) errors as a function of the neural network complexity, represented by the number of hidden neurons.

As we can see, the bigger number of hidden neurons, the smaller training error. However, for very small and very big orders, the selection error is big. In the first case, we have underfitting, and in the second one overfitting. In our case, the neural network with best generalization properties has 4 hidden neurons. Indeed, the selection error takes a minimum value at that point.

Order selection algorithms are in charge of finding the complexity of the neural network which yields the best generalization properties. Two of the most used order selection algorithms are incremental order and decremental order.

Incremental order is the simplest order selection algorithm. This method starts with a small number of neurons and increases the complexity until some stopping criteria is met.

The algorithm returns the neural network with the optimal order obtained.

A similar order selection algorithm is decremental order. It starts with a big number of neurons and decreases the complexity until a stopping criteriom is reached.

Which features should you use to create a predictive model? This is a difficult question that may require deep knowledge of the problem domain.

Input selection algorithms automatically extract those features in the data set that provide the best generalization capabilities. That is, they search for the subset of inputs that minimizes the selection error.

The inputs selection algorithm stops when a specified condition is satisfied. Some stopping criteria used are:

The growing inputs method calculates the correlation of every input with every output in the data set.

Then it starts with a neural network that only contains the most correlated input and calculates the selection error for that model.

It keeps adding the most correlated variables until the selection error increases.

The algorithm returns the neural network with the optimal subset of inputs found.

The pruning inputs method starts with all the inputs in the data set.

It keeps removing those inputs with smallest correlation with the outputs until the selection error increases.

A different class of inputs selection method is the genetic algorithm.

This is a stochastic method based on the mechanics of natural genetics and biological evolution. Genetic algorithms usually include fitness assignment, selection, crossover and mutation operators.

You can find more information about this topic at the Genetic algorithms for feature selection in our blog.