Simple Genetic Algorithm in Python

Question 1

For past few months I was trying to understand genetic algorithms (GA) and most of the materials availble in the web was not always easy for me. Then I came across this article written by Ahmed Gad Genetic Algorithm Implementation in Python which implemented GA with numpy. Using this as a guiding tool I wrote my first GA in python with numpy. All of the codes were written by me except cal_pop_fitness.

The problem GA need to solve was to find parameters (a,b) in an equation of the format y = a*x1+b*x2 where x1,x2 and y are give as a numpy array. The equation I chose to solve is y = 2*x1+3*x2. Because we have two parameters to solve I chose two genes per chromosome. In all GA's we have to choose a fitness function and I chose mean squared error (MSE) as the fitness function for selecting best parents. MSE was chosen because we already have the real output y. The lesser the MSE better the parents we selected. This also the main difference between mine and Ahmed's GA where he used a maximisation fitness function I used a minimisation function. From the selected parents we generate offsprings by crossover and mutations. All offsprings go through crossovers but only a few offsprings have mutations.

import numpy as np
np.set_printoptions(formatter={'float': '{: 0.3f}'.format})
np.random.seed(1)
def generate_data(x_range):
 # Formula='2*x1+3*x2'
 x_range = range(-x_range,x_range)
 x = np.vstack((np.array(x_range),np.array(x_range)*2)).T
 y = [2*i[0]+3*i[1] for i in x]
 return x,np.array(y)
def cal_pop_fitness(equation_inputs, pop):
 fitness = np.sum(pop*equation_inputs, axis=1)
 return fitness
def select_best_parents(total_population,top_parents):
 arr = []
 for i in total_population:
 pred_y = cal_pop_fitness(i,X)
 mse = (np.square(y-pred_y)).mean(axis=None) # Mean squared error
 # Append the mse with chromose values
 row = np.append(i,mse).tolist()
 arr.append(row)
 arr = np.array(arr)
 # Sorting the chromosomes respect to mse
 # Lower the mse better the individuals
 arr = arr[arr[:,2].argsort()]
 error = arr[:,2]
 arr = arr[:,0:2] # removing mse column
 return arr[0:top_parents,:],error[0:top_parents]
def crossover(sub_population):
 children = []
 for i in range(population_size):
 # Selecting random two parents
 parent1 = np.random.randint(0,sub_population.shape[0])
 parent2 = np.random.randint(0,sub_population.shape[0])
 # A child is created from parent1's first gene and parent2's second gene
 child = [sub_population[parent1][0],sub_population[parent2][1]]
 children.append(child)
 return np.array(children)
def mutation(population):
 for i,row in enumerate(population):
 if np.random.rand() < mutation_rate:
 # Random mutations 
 population[i][np.random.randint(genes)] = np.random.uniform(low = -4.0, high = 4.0)
 return population
if __name__ == "__main__":
 # Generate input ouptut data
 X,y = generate_data(150)
 genes = X.shape[1] # Total genes in a chromosome
 population_size = 50 # Number of populations
 top_parents = int(0.25*population_size) # Select top parents from the total populations for mating
 mutation_rate = 0.1 # Mutation rate
 generations = 1000 # number of generations
 # Step1 : Population
 # Total population
 total_population = np.random.uniform(low = -4.0, high = 4.0, size = (population_size,genes))
 for i in range(generations):
 # Step2 : Fitness calculation
 # Step3 : Mating pool
 # Step4 : Parents selection
 # Choosing best parents with their corresponding mse
 sub_population,mse = select_best_parents(total_population,top_parents)
 # Step5 : Mating
 # Crossover
 new_population = crossover(sub_population)
 # Mutation
 new_population = mutation(new_population)
 print("Best Parameters: ",np.round(sub_population[0],3),"Best MSE: ", round(mse[0],3))
 # Next population
 total_population = new_population
 # Real
 x_range=range(-10,10)
 x1 = np.array(x_range)
 x2 = np.array(x_range)*2
 formula='2*x1+3*x2'
 y1=eval(formula)
 # Predicted by Genetic algorithm
 a = 1.943
 b = 3.029
 formula = 'a*x1+b*x2'
 y2=eval(formula)
 print("\nMSE between Real and predicted Forumulas : ",(np.square(y1-y2)).mean(axis=None))

Question 2

What is it that you expect to get from the community? General feedback on the code? Style? Clarity? Performance?

Question 3

Code review. eventhough I took the ideas from the mentioned article I wrote the code myself except for cal_pop_fitness.

Question 4

My question was more on which aspects of the code should be in the focus of the review. If you don't care which aspects get attention it's okay to say so.

Question 5

Actuall the implementation of GA itself. Especially the creation of offspring using crossover and mutation. Whether my implementation is correct or not?. The second reason for posting the code here is to share my implementation for the newbie's in GA.

Question 6

From my understanding of the subject your implementation seem correct but there are numerous things you could fix or improve there.

Specific remarks

Use actually predicted values for printing i.e.

 # Predicted by Genetic algorithm
 a, b = sub_population[0]

Your initialisation of X is dangerous and prevents the algorithm to converge on some values since X[0] = 2*X[1], while they are supposed to be independant variables. Use random values instead.
Don't use X and y from global context in select_best_parents you could have ugly surprises if these are used elsewhere in the program. Add them to function parameters.
Your mutation function uses magic numbers, you could make [-4, 4] interval a parameter of your GA to make it more generic.
You could make a function of the main's for to segregate well what is part of algorithm and what is parameters.
You could extract your sorting key (MSE) as a separate function so it's easier to replace.

General remarks

If you want to validate your approach you need more than a single test value. So for example after you make a and b (3 and 2) parameters, you can use a loop or a variety of examples to prove it converges toward these values.
I find odd and damaging scientific programmers often pass on object oriented design. Using classes could be handy to pass the variables from core function to core function using object context rather than extending parameters of every function as needed, so it would to propose a catalog of different implementations of GA mutation, for instance. Consider migrating to an object oriented design if these are later goals
It could be beneficial to separate the usage example from the core GA functions in two distinct modules and files. They are independent parts that can be improved and extended separatedly.
I believe you can support more than two dimensions using a bit of tidying and genericizing in generate_data, crossover, and select_best_parents.
Improve style and formatting a bit, trying to follow PEP8. Keyword don't need spaces, two spaces between function definitions, use docstrings rather than comments etc.

Diane M Diane M 5132 silver badges7 bronze badges · Accepted Answer · 2019-09-03 01:52:34Z

From my understanding of the subject your implementation seem correct but there are numerous things you could fix or improve there.

Specific remarks

Use actually predicted values for printing i.e.

 # Predicted by Genetic algorithm
 a, b = sub_population[0]

Your initialisation of X is dangerous and prevents the algorithm to converge on some values since X[0] = 2*X[1], while they are supposed to be independant variables. Use random values instead.
Don't use X and y from global context in select_best_parents you could have ugly surprises if these are used elsewhere in the program. Add them to function parameters.
Your mutation function uses magic numbers, you could make [-4, 4] interval a parameter of your GA to make it more generic.
You could make a function of the main's for to segregate well what is part of algorithm and what is parameters.
You could extract your sorting key (MSE) as a separate function so it's easier to replace.

General remarks

If you want to validate your approach you need more than a single test value. So for example after you make a and b (3 and 2) parameters, you can use a loop or a variety of examples to prove it converges toward these values.
I find odd and damaging scientific programmers often pass on object oriented design. Using classes could be handy to pass the variables from core function to core function using object context rather than extending parameters of every function as needed, so it would to propose a catalog of different implementations of GA mutation, for instance. Consider migrating to an object oriented design if these are later goals
It could be beneficial to separate the usage example from the core GA functions in two distinct modules and files. They are independent parts that can be improved and extended separatedly.
I believe you can support more than two dimensions using a bit of tidying and genericizing in generate_data, crossover, and select_best_parents.
Improve style and formatting a bit, trying to follow PEP8. Keyword don't need spaces, two spaces between function definitions, use docstrings rather than comments etc.

Stack Exchange Network

Simple Genetic Algorithm in Python

1 Answer 1

Specific remarks

General remarks

Your Answer

Sign up or log in

Post as a guest

Post as a guest

Hot Network Questions

Simple Genetic Algorithm in Python

1 Answer 1

Specific remarks

General remarks

Your Answer

Sign up or log in

Post as a guest

Post as a guest

Related

Hot Network Questions