I'm implementing Genetic Algorithm in C# WinForms to find optimal solution of a very simple mathematical equation. The word "simple" here means that the equation must have these qualities:
- Only consists of whole positive numbers
- End result of equation must be a positive number lesser than 1001
- A minimum of 2 variables up to a maximum of 7 variables are allowed
- Coefficients of each variables must be between 1 to 10 (lower bound and upper bound inclusive)
- Variable values are from 0 - 500
- Only allows addition
I thought that I would be able to reach optimum solution with the equation being so "easy", one would say. However, I always only get a solution which is very close to the optimal solution. To evaluate whether the solution is optimal, I use the following formula:
f(x) = absolute((sum_of_all_variables) - equation_result)
For an example, I have the following equation:
1a + 1b = 12
If a is 2 and b is 3, then the f(x) of said individual would be abs(2 + 3 - 12) = 7. My code stops when f(x) reaches 0 or I have generated 50 generations.
My current mutation rate is 50% and my current crossover rate is 25%. Selection method used is roulette-wheel selection. Mutation method is random mutation, i.e. I simply pick a random gene in my pool of genes to change its value, up to 50% genes are changed per generation.
My expectation: the code yields a solution (individual) that has a f(x) value of 0, which is one of the optimal solutions available.
Current result: the code yields a solution that is nearly optimal. One common pattern I've found is that the solution that is nearly optimal are often replicated exactly in the next generation.
My guess on the problem now:
I think that it has something to do with how I do my crossovers. What I do to crossover:
- Assign a random value to each individual
- If said random value is less than crossover rate, then save that individual's data to later be used for crossover
- After gathering all to-be-crossovered individuals, I randomly select a crossover point for each individual
- I think this is the problem: I do crossover on the first individual with the second individual on the to-be-crossovered-individual list, merging genes from the first individual and second individual that is to be crossover-ed. Then, proceed with the second individual and third individual and so on. The last individual will be crossovered with the first individual.
However, as I said, it doesn't yield optimum result. Is it a problem with my logic or is this the expected behaviour of Genetic Algorithm on this simple mathematical equation?
Crossover methods I'm using for reference (I'm using C# WinForms to display data in ListView):
private static void Crossover(List<Chromosome> chromosomes, Random seed)
{
List<int> crossoverChromosome = new List<int>();
for (int i = 0; i < 50; i++)
{
decimal randomedValue = RandomizeValue(seed);
if (randomedValue < Population.CrossoverRate) crossoverChromosome.Add(i);
}
for (int i = 0; i < crossoverChromosome.Count; i++)
{
int crossoverPoint = seed.Next(0, chromosomes[0].GeneValues.Count);
chromosomes[crossoverChromosome[i]] = chromosomes[crossoverChromosome[i]].MixChromosome(chromosomes[crossoverChromosome[(i + 1) % crossoverChromosome.Count]], crossoverPoint);
}
}
private static decimal RandomizeValue(Random seed)
{
return Math.Round((decimal)seed.NextDouble(), 5);
}
public Chromosome MixChromosome(Chromosome mixture, int crossoverPoint)
{
List<Gene> newGenes = new List<Gene>();
newGenes.AddRange(this.GetGenes(0, crossoverPoint));
newGenes.AddRange(mixture.GetGenes(crossoverPoint, this.GeneValues.Count));
return new Chromosome(DesiredValue, OperatorData, newGenes); // Ignore the DesiredValue and Operator Data, it has nothing to do with crossover
}
private List<Gene> GetGenes(int firstIndex, int lastIndex)
{
List<Gene> slicedGenes = new List<Gene>();
for (int i = firstIndex; i < lastIndex; i++)
{
slicedGenes.Add(Genes[i].CloneGene());
}
return slicedGenes;
}
I've only heavily tested equations with two variables.
EDIT
Additional info:
- I do not use elitism
- Initial population size = 50 individuals
- Population size per generation = 50 individuals
10 reruns on the equation 1a + 1b = 10 yields the following best fitness values after 50 generations:
- First rerun: 24
- Second rerun: 11
- Third rerun: 42
- Fourth rerun: 13
- Fifth rerun: 5
- Sixth rerun: 19
- Seventh rerun: 7
- Eighth rerun: 1
- Ninth rerun: 6
- Tenth rerun: 29
- Additional info on each rerun: the chromosome which yields the best fitness value in 50 generations is often repeated in the population. For example, on the eighth rerun, I find a lot of occurrences on chromosome with value
a4 and valueb7.
-
There are many parameters that play a role in building a genetic algorithm. There is usually significant trial-and-error involved, and even then, due to randomness, there is no guarantee of finding optimal answers. Your question can be improved if you clearly state all the parameters you are using. For example: how big is the initial population? Do you use any kind of elitism? How many times have you run this, and what is the typical/best/worse fitness of runs?tucuxi– tucuxi2019年06月17日 11:11:01 +00:00Commented Jun 17, 2019 at 11:11
-
@tucuxi I've added more information. Please take a look.Richard– Richard2019年06月17日 11:27:37 +00:00Commented Jun 17, 2019 at 11:27
-
1You only posted a small part of your code and it's highly likely for the problem to be how you defined the classes for this specific problem or how you're calling the methods you posted. You should preferably just post a minimal reproducible example. Ideally this would also include a detailed high-level description of what your code is doing. Although generally these sorts of issues are best solved by debugging to see what your code is doing. But a 50% mutation rate is huge - this should probably be less than 10%, maybe closer to 1%.Bernhard Barker– Bernhard Barker2019年06月17日 12:03:14 +00:00Commented Jun 17, 2019 at 12:03
-
@Dukeling I tweaked my mutation rate to 50% because all the chromosomes in the final generation (50th generation) of each rerun are repeated heavily (more than 50%). Even with such a high mutation rate, I still have heavy repetition of similar chromosomes that are nearest to f(x) = 0. I think that my solutions are converging to a local maximum solution, not the global maximum.Richard– Richard2019年06月17日 14:35:30 +00:00Commented Jun 17, 2019 at 14:35
1 Answer 1
One mayor problem is (like you assumed) the crossover. In the way you describe it (or at least as far as I understand) you do crossover on every individual of your solution that is randomly chosen to generate the next population.
The basic idea behind a genetic algorithm is Darwin's survival of the fittest and not some random survival (or reproduction).
So I think the problem is that every individual has the same chance for reproduction, which doesn't make much sense in a genetic algorithm. A better solution would be to let the individuals, that lead to a result that is near to the correct one, reproduce more than an individual that doesn't lead to a good result. Otherwhise it would be quite similar to a random search.
Normally it's still usefull to have a random choosing of the individuals for reproduction, because this will lead to results that are not always the best ones but explore a bigger range of the seeking area.
One common way to do this is by using a fitness-proportional selection, that will choose the individuals for reproduction randomly, but based on the fitness values. So individuals with a high fitness (the ones that lead to a result near to the correct one in your example) have a higher chance of reproducing.
Another common way would be a stochastically distributed selection that will also choose random individuals for reproduction, with a higher chance for better individuals, but that will also guarantee that the individuals that are better than the average fitness will be reproducing at least once.
An example implementation of a Fitness-Proportional-Selection could look like this (unfortunately it's java code and no C#, but they are quite similar...):
import java.util.concurrent.ThreadLocalRandom;
import com.google.common.annotations.VisibleForTesting;
/**
* A selector that randomly chooses pairs to be selected for reproduction based on their probability to be selected.
*/
public class FitnessProportionalSelector implements Selector {
/**
* Select pairs of parents (by index) that are combined to build the next generation.
*
* @param selectionProbability
* The probability to be selected for every DNA in the current population (sums up to 1).
*
* @param numPairs
* The number of pairs needed (or the number of individuals needed in the next generation).
*
* @return Returns an int-array of size [numPairs * 2] including the pairs that are to be combined to create the next population (a pair is on
* position [i, i+1] for i % 2 = 0).
*/
@Override
public int[] select(double[] selectionProbability, int numPairs) {
double[] summedProbabilities = Selector.toSummedProbabilities(selectionProbability);
int[] selectionPairs = new int[numPairs * 2];
double chosenProbability;
for (int i = 0; i < numPairs * 2; i++) {
chosenProbability = getRandomNumber();
selectionPairs[i] = Selector.getSelectedIndexByBisectionSearch(summedProbabilities, chosenProbability);
}
return selectionPairs;
}
@Override
public String toString() {
return "FitnessProportionalSelector []";
}
@VisibleForTesting
/*private*/ double getRandomNumber() {
return ThreadLocalRandom.current().nextDouble();
}
}
Or a solution for the Stochastically-Distributed-Selection (also in java):
import java.util.concurrent.ThreadLocalRandom;
import com.google.common.annotations.VisibleForTesting;
/**
* A selector that chooses the pairs to be reproduced by a stochastically distributed selection method.
*
* The selection probability is proportional to the given probability, but it's ensured, that individuals with a probability above average are chosen
* at least once.
*/
public class StochasticallyDistributedSelector implements Selector {
/**
* Select pairs of parents (by index) that are combined to build the next generation.
*
* @param selectionProbability
* The probability to be selected for every DNA in the current population (sums up to 1).
*
* @param numPairs
* The number of pairs needed (or the number of individuals needed in the next generation).
*
* @return Returns an int-array of size [numPairs * 2] including the pairs that are to be combined to create the next population (a pair is on
* position [i, i+1] for i % 2 = 0).
*/
@Override
public int[] select(double[] selectionProbability, int numPairs) {
double[] summedProbability = Selector.toSummedProbabilities(selectionProbability);
int[] selectedPairs = new int[2 * numPairs];
double startPoint = getRandomNumber();
double addedAverage = 1d / (2d * numPairs);
double stochasticallySelectedProbability;
for (int i = 0; i < numPairs * 2; i++) {
//select the pairs stochastically
stochasticallySelectedProbability = startPoint + i * addedAverage;
stochasticallySelectedProbability %= 1;
selectedPairs[i] = Selector.getSelectedIndexByBisectionSearch(summedProbability, stochasticallySelectedProbability);
}
//shuffle the pairs to distribute them stochastically
shuffle(selectedPairs);
return selectedPairs;
}
@VisibleForTesting
/*private*/ void shuffle(int[] selectedPairs) {
//shuffle the selected pairs in place
int swapIndex;
int tmp;
for (int i = selectedPairs.length - 1; i > 0; i--) {
swapIndex = (int) (getRandomNumber() * (i + 1));
tmp = selectedPairs[i];
selectedPairs[i] = selectedPairs[swapIndex];
selectedPairs[swapIndex] = tmp;
}
}
@VisibleForTesting
/*private*/ double getRandomNumber() {
return ThreadLocalRandom.current().nextDouble();
}
@Override
public String toString() {
return "StochasticallyDistributedSelector []";
}
}
I took these example codes from a genetic optimizer project I created for my master thesis. If you want to have a look at it you can find it on my github account
Some further improvements:
- try using the mean square error instead of the absolute value (f(x) = absolute((sum_of_all_variables) - equation_result))
- using a selection pressure is another good way to choose the right individuals for reproduction (and getting the parameters to converge); you can find a solution in the github project in the package genetic_optimizer.selection if you need examples.
- elitism is quite usefull here to not loose the best solution you already have (at least if you don't keep it in the population, keep it back to return it after the calculation)