Genetic algorithms provide a powerful approach to feature selection in machine learning.
Many real-world machine learning problems — from customer targeting to medical diagnosis — involve datasets with tens or even hundreds of variables.
Not all of these variables are useful. Some add noise or redundancy, and usually reduce the accuracy of a predictive model.
Feature selection methods, such as genetic algorithms, automatically identify the most relevant inputs and discard those that are unnecessary.
By focusing only on the variables that matter, these techniques simplify models and improve their predictive accuracy.
Feature selection
We can view feature selection as a combinatorial optimization problem.
The goal is to find the subset of inputs that gives the best predictive performance.
In practice, this means minimizing the error of the model on the validation data.
The model can either include (1) or exclude (0) each feature, so with N features, there are 2N possible combinations.
With only 30 features, the number of possible subsets already exceeds one billion, so we cannot test them all.
This is why intelligent search strategies, such as genetic algorithms, are used to explore the space and identify the most relevant features efficiently.
Genetic algorithms
Genetic algorithms are among the most powerful methods for feature selection.
They are stochastic optimization methods inspired by natural evolution.
Genetic algorithms work with a population of individuals, each representing a candidate solution.
Every individual is a neural network where genes indicate whether a feature is included (1) or excluded (0).
In each generation, the algorithm works like evolution in nature: the fittest individuals survive, mix their genes through crossover, and sometimes mutate.
The result is a new population that gradually improves over the generations.
The following diagram illustrates the genetic algorithm process for feature selection.
Next, we will describe the main operators and parameters in more detail.
1. Initialization operator
The first step is to create and initialize the population of individuals.
Because the genetic algorithm is stochastic, it usually initializes the genes of each individual at random.
For example, consider a neural network with six possible features.
If we generate four individuals, we obtain four different neural networks, each with a random subset of features.
The figure below shows this population: each individual has six binary genes, with 1 meaning inclusion and 0 exclusion.
Alternatively, initialization can use the correlation of features with the target.
In this case, more correlated features have a higher chance of being activated, giving the algorithm a head start.
2. Fitness assignment operator
After initialization, each individual in the population is evaluated by assigning a fitness value.
To do this, we train each neural network with the training data and measure its error on the selection set.
Individuals with lower selection error have higher fitness and are more likely to undergo recombination.
The most common method is rank-based fitness, where individuals are ranked by error, and their fitness depends only on that rank.
The following table shows, for our example, each individual’s selection error, rank, and probability of being selected.
Selection error | Rank | Probability | |
---|---|---|---|
Individual 1 | 0.9 | 1 | 0.1 |
Individual 2 | 0.6 | 3 | 0.3 |
Individual 3 | 0.7 | 2 | 0.2 |
Individual 4 | 0.5 | 4 | 0.4 |
A simple way to visualize this is with a fitness pie chart, where the area of each slice is proportional to an individual’s fitness.
In the example below, individual #4 is the fittest (40% share), while individual #1 is the weakest (10%).
3. Selection operator
Once fitness is assigned, the selection operator chooses which individuals will recombine to form the next generation.
Fitter individuals are more likely to survive, but randomness ensures diversity in the population.
Elitist selection guarantees that the very best always pass on their traits.
The number of elitist individuals is usually small, often one or two.
After elitism, the algorithm applies the roulette wheel method.
Here, each individual occupies a slice of a wheel proportional to its fitness.
Spinning the wheel then selects individuals at random, giving fitter ones a higher chance.
The figure below shows the process in our example.
Individual #4 survives by elitism, while #3 is chosen by roulette, even though #2 had slightly higher fitness.
This happens because the algorithm is stochastic, not strictly deterministic.
Typically, half of the population is selected at this stage.
4. Crossover operator
After selection, the crossover operator recombines the chosen individuals to create the next generation.
It works by randomly pairing two parents and combining their features to produce new offspring.
This process repeats until the new population reaches the same size as the old one.
In the uniform crossover method, each feature of the offspring is taken from one parent or the other with equal probability.
The figure below shows an example with two parents generating four offspring.
Some features in each neural network come from one parent, while others come from the other.
The parents are then replaced by their offspring, so the population size remains constant.
5. Mutation operator
Crossover alone can create offspring that resemble their parents too closely.
This lack of diversity slows down the search for better solutions.
The mutation operator fixes this by randomly flipping the value of some features in the offspring.
To decide whether a feature mutates, we generate a random number between 0 and 1.
If the number is smaller than the mutation rate, the feature changes from included to excluded, or vice versa.
A common choice for the mutation rate is 1/m, where m is the number of features.
On average, this means one feature of each individual mutates in every generation.
The image below shows an example where the fourth input of the neural network has been mutated.
With mutation applied, the population gains fresh diversity for the next cycle.
Process and results
The cycle of fitness assignment, selection, recombination, and mutation is repeated until a stopping criterion is met.
With each generation, the population adapts and improves over the previous one.
The chart below shows the typical evolution of the mean selection error and the best selection error across generations.
As the generations progress, the mean error steadily decreases and converges toward a minimum.
The result is the neural network with the lowest selection error among all candidates.
In our example, we select the neural network below as the best model for this application.
Conclusions
Feature selection is essential in machine learning, as many datasets include noisy or redundant variables.
Genetic algorithms are one of the most effective methods to address this challenge.
They outperform traditional techniques, scale to large datasets, require no prior knowledge, and can be parallelized.
However, they are computationally demanding since each candidate solution requires training a model, and convergence may take time.
In short, genetic algorithms provide a powerful way to identify the most relevant features, though at a higher computational cost.
Neural Designer offers an optimized genetic algorithm for feature selection, available in the Input Selection tool of the Model Selection panel.