I made a genetic search algorithm in Python for the Travelling Salesman Problem for a midterm project. The grade was fine, but I was hoping to get some pointers on style and documentation. Please provie any feedback you have about how I can make my code more readable, consistent, and friendly.
"""
geneticTS.py
By Logan Davis | 10/25/15
Description: A genetic solution to the Traveling Saleman
problem for the AI Midterm.
Python 2.7 | Editor: Emacs | Distro: Ubuntu 15
"""
import random, math
class Citizen(object):
"""
Citizen is small object meant to be used by
an instance of a TravelingSalesmen object.
All the Citizen class can do is hold two
attributes: route (it's planned travelsal
through a series of locations) and fitness
(the distance of that route. The smaller,
the better).
"""
def __init__(self,route = None, fitness = 0):
self.route = route
self.fitness = fitness
class TravelingSalesmen(object):
"""
OVERVIEW:
A genetic algorithm class to generate answers to the
Traveling Salesmen problem.
HOW TO USE:
This simplist way to use this is by creating an instance
of this class, and then call self.generate() where self is the
instance you just created. However this method can be slow
because you have to enter each location in one at a time.
One of the optional arguements is to directly assign an
organized route to the route arguement. An organized is a list
of locations, each location is a list containing a name/identifier
as the first element, an x co-ord as the second, and a y co-ord
as the last. EXAMPLE OF LOCATION:
["Jutland", 45, 2000.3]
After either directly assigning these locations in a list, you still have
call the self.generate() method, but it will skip constructing a route and
just generate an answer. If you don't directly assign a route, after you
input the route through a prompt, it will generate the answer with no
further method calls.
A self.generate call returns an instance of a Citizen object. That returned
instance will contain the best route and its fitness.
OPTIONAL ARGS:
- generations = the amount of gerneations the algorithm will run.
- populationSize = the amount of different possible answers (Citizens)
the algorithm will use in generating an answer.
- retention = the amount of the population that is kept at the culling
phase based on their fitness.
- mutationRate = the chance of mutations occuring within self.population
- route = the collection of locations you wish to traverse
NOTE:
To test this class, a test route has been provided.
It is creatively named testRoute.
---------------------------DOCTTEST-------------------------------
>>> ts = TravelingSalesmen(10,10,0.2,0.01,testRoute)
>>> answer = ts.generate()
>>> print math.floor(answer.fitness) #floor to deal with float rounding
9003.0
>>>
"""
def __init__(self,generations = 1000,populationSize = 1000,retention = 0.2,mutationRate = 0.01,route = []):
"""
All stored values for any instance of this class.
"""
self.population = [] #The entire population of Citizen instances
self.route = route #The collection of all locatons
self.generations = generations #The number of generations/iterations to find an answer
self.retention = retention #The size of the population that lives on each generation
self.populationSize = populationSize #The Total allowed size of self.population
self.mutationRate = mutationRate #The chance of a mutation occuring in self.population
self.bestCitizen = None #The best citizen found, meaning the lowwest (shortest) fitness score.
def __str__(self):
"""
This makes it so that any printing of an instance
of this object will give the best route through the
locations in self.route and the fitness score of that
route.
"""
return "Best route: {} with a fitness of {}.".format(self.bestCitizen.route,self.bestCitizen.fitness)
def generate(self):
"""
generate actually runs the algorithm.
If the instance this is invoked upon
already has a route, it just runs,
otherwise is calls constructRoute
to create a list of locations first.
The results are stored in self.bestCitizen.
Then the instance held in self.bestCtitizen
are returned.
----------------DOCTEST-----------------
>>> ts = TravelingSalesmen(100,100,0.2,0.01,testRoute)
>>> answer = ts.generate()
>>> ts._fitness(answer)
>>> math.floor(answer.fitness) #to deal with rounding of precision
9003.0
>>>
"""
if self.route == []:
self.constructRoute()
self._createPopulation()
for i in xrange(0,self.generations):
self._repopulate()
self._organizePopulation()
self.bestCitizen = self.population[0]
return self.bestCitizen
def constructRoute(self):
"""
constructRoute prompts the user to enter a
series of locations and their x & y co-ordinates.
These locations are stored as a list in
self.route. The function exits if the user
inputs any of the strings held in
possibleStops after entering a location.
"""
done = False
possibleStops = ["y","yes","YES","Yes","STOP","stop","Stop","Fuck off"]
while done != True:
location = []
location.extend([raw_input("What is the location's name?"),\
input("What is the x co-ordinate?"),\
input("What is the y co-ordinate?")])
self.route.append(location)
answer = raw_input("Are you done? (y/n)")
if answer in possibleStops:
done = True
def _createPopulation(self):
"""
_createPopulation generates citizens
each with a randomized list of the
elements of self.route. The number of
these citizens are defined by self.populationSize
and they are all appended to self.population
"""
copyRoute = self.route[:] #makes shallow copy, random.shuffle operates in-place
for i in xrange(0,self.populationSize):
self.population.append(Citizen())
for citizen in self.population:
citizen.route = self.route[:]
random.shuffle(citizen.route)
def _repopulate(self):
"""
_repopulate is a wrapper to call
_organizePopulation, _cullAndRefill,
and _mutate in order.
"""
self._organizePopulation()
self._cullAndRefill()
self._mutate()
def _organizePopulation(self):
"""
_organizePopulation evalutes the fitness of each citizen
and then sort them from most fit (lowwest distance) to
least fit (largest distance).
-----------------------DOCTEST---------------------------
>>> ts = TravelingSalesmen(1,2,0.2,0.01,testRoute)
>>> ts._createPopulation()
>>> ts._organizePopulation()
>>> (ts.population[0].fitness < ts.population[1].fitness) \
or (ts.population[0].fitness == ts.population[1].fitness)
True
>>>
"""
for citizen in self.population:
self._fitness(citizen)
self._rankPopulation()
def _cullAndRefill(self):
"""
_cullAndRefill takes to top % of self.population (% defined in self.retention)
then repopulates the pool of citizens in self.populate based on the routes of
the fit population (those retained after culling).
Crossover method: take two random citizens, if their first and second halves
don't match, take the frist half from and and add it to the
first half of the other. Give that to a new citizen in the
population. If the halves do have some over lap, grab another
random citizen and test the first citizena and this new one until
a non-redundant match is found or the limit of tries is reached.
if the limit is reached, just make a copy of the first random citizen.
------------------------------------DOCTEST------------------------------------------
>>> ts = TravelingSalesmen(2,10,0.2,0.01,testRoute)
>>> ts._createPopulation()
>>> ts._organizePopulation()
>>> comparison = ts.population
>>> ts._cullAndRefill()
>>> (len(ts.population) == len(comparison)) \
and (ts.population != comparison)
True
>>>
"""
middle = len(self.route)/2
fitPopulation = self.population[:int(math.floor(len(self.population)*self.retention))]
newPopulation = fitPopulation[:]
for i in xrange(len(self.population)-len(fitPopulation)):
citizen1 = fitPopulation[random.randint(0,len(fitPopulation)-1)]
for i in xrange(0,(10*len(fitPopulation))):
citizen2 = fitPopulation[random.randint(0,len(fitPopulation)-1)]
if self.matchFinder(citizen1,citizen2,middle) == True:
newPopulation.append(Citizen())
newPopulation[-1].route = citizen1.route[:middle] + citizen2.route[middle:]
self.population = newPopulation[:]
break
elif(i == (10*len(fitPopulation))-1):
newPopulation.append(Citizen())
newPopulation[-1].route = citizen1.route
self.population = newPopulation[:]
def _mutate(self):
"""
_mutate iterates through the entire
self.population. If a random.random()
call returns a value <= self.mutationRate,
then two random locations in a single
citizen's route are flipped.
"""
for i in xrange(0,len(self.population)):
if random.random() <= self.mutationRate:
index1 = random.randint(0,len(self.route)-1)
index2 = random.randint(0,len(self.route)-1)
copy = self.population[i] .route[index1]
self.population[i].route[index1] = self.population[i].route[index2]
self.population[i].route[index2] = copy
def _fitness(self, citizen):
"""
_fitness evaluates the fitness of
citizen. The measure of this
fitness is the distance of that
citizen's route. distance is
calculated using the standard
xy distance formula.
Take 1 arguement:
citizen = some citizen instance
---------------DOCTEST-----------------
>>> citizen = Citizen(testRoute)
>>> ts = TravelingSalesmen()
>>> ts.route = citizen.route
>>> ts._fitness(citizen)
>>> math.floor(citizen.fitness) #floored to deal with rounding
10208.0
>>>
"""
distance = 0
for i in xrange(0,len(self.route)):
if i < (len(self.route) - 1):
xDistance = abs(citizen.route[i][1] - citizen.route[i+1][1])
yDistance = abs(citizen.route[i][2] - citizen.route[i+1][2])
else:
xDistance = abs(citizen.route[i][1] - citizen.route[0][1])
yDistance = abs(citizen.route[i][2] - citizen.route[0][2])
distance += math.sqrt((xDistance**2)+(yDistance**2))
citizen.fitness = distance
def _rankPopulation(self):
"""
sorts self.population, in place, by each citizens fitness.
"""
self.population = sorted(self.population, key=lambda citizen: citizen.fitness)
def matchFinder(self,citizen1,citizen2,cut):
"""
matchFinder takes two citizen instances,
compares slices of thier routes and
returns true if there is no over lap
and false if there is any redundance.
Takes 3 arguements:
citizen1 = some citizen instance
citizen2 = some other citizen instance
cut = where you are slicing their routes
-----------------DOCTEST--------------------
>>> ts = TravelingSalesmen()
>>> citizen1 = Citizen([["jutland",45,2000],\
["flemington",456,2],\
["clinton",3456,234],\
["highland",300,20]])
>>> citizen2 = Citizen([["clinton",3456,234],\
["highland",300,20],\
["jutland",45,2000],\
["flemington",456,2]])
>>> ts.matchFinder(citizen1,citizen2,2) #overlapping route halves
False
>>> citizen1.route = citizen1.route[:2] #non-overlapping
>>> citizen2.route = citizen2.route[:2]
>>> ts.matchFinder(citizen1,citizen2,1)
True
"""
firstHalf = citizen1.route[:cut]
secondHalf = citizen2.route[cut:]
for i in firstHalf:
for x in secondHalf:
if x == i:
return False
return True
#The Test Route provided
testRoute = [["jutland",45,2000],["flemington",456,2],["clinton",3456,234],["highland",300,20]]
if __name__ == "__main__":
import doctest
doctest.testmod()
Performance is not the main concern; I know there are some not-so-efficient statements in there.
3 Answers 3
Code:
It appears to me that your code style is just fine! I have only a few comments:
Whenever you see something like for i in range(len(mylist))
, try to see if you can use a "for-in" loop. An example of this is your function _mutate()
. The whole function can be rewritten as:
for citizen in self.population:
if random.random() <= self.mutationRate:
index1 = random.randint(0,len(self.route)-1)
index2 = random.randint(0,len(self.route)-1)
copy = citizen.route[index1]
citizen.route[index1] = citizen.route[index2]
citizen.route[index2] = copy
There is even possible to rewrite the flip stage as
citizen.route[i1], citizen.route[i2] = citizen.route[i2], citizen.route[i1]
Index is in my opinion self explanatory, so there is no need to use the whole index
word.
Another issue I see is that it is hard to tell if some functions mutate (that means change) the state of the model given the functions name and description. For example _fitness()
. This function changes the state of your class, but neither the function name nor the description explains this. Consider changing the name to _update_fitness()
or something similar.
Comments:
Everyone disagrees when it comes to how to comment code. When that is said, there is some consensus. Self documenting code is the best. Try to write your code as clear as possible. If you can do this, then there is not need for to many comments.
I like that you use block comments on function and classes, but some of the are not necessary in my opinion. The rule here is basically to write code that explains why you do something. It appears to me that you have a tendency to explain the implementation, though the implementation can be seen just fine by the code. An example:
def _mutate(self):
"""
_mutate iterates through the entire
self.population. If a random.random()
call returns a value <= self.mutationRate,
then two random locations in a single
citizen's route are flipped.
"""
for i in xrange(0,len(self.population)):
if random.random() <= self.mutationRate:
index1 = random.randint(0,len(self.route)-1)
index2 = random.randint(0,len(self.route)-1)
copy = self.population[i].route[index1]
self.population[i].route[index1] = self.population[i].route[index2]
self.population[i].route[index2] = copy
In my opinion, this code explains itself very well. I would look into another Stack Overflow answer as well as I feel it explains things well.
There are probably more stuff i could mention, an someone else might also disagree with the things I have pointed out, but at least you have someone elses opinion.
I also know this is for a school project. In a real life project, I would probable prefer less and clearer comments ;)
This is not going to be a complete review, more some random thoughts as I gaze through your code:
- Correct spelling is always good – Your code has a lot of spelling errors, which is unfortunate. Neither of us are perferct, but when there are too much spelling errors it is bad. In one case I delivered a report where I consequently had used the Norwegian word for amnesty (amnesti), in stead of anesthesia (anestesi). Not good. Especially when delivering reports, try to get someone to proofread your work.
- Add one-liners to start of docstrings – You have many good (and long) docstrings, but I miss the one line summary at the top of them. When they are missing, it makes it harder to easily grasp what the methods main task are.
- docstrings normally breaks around 72 characters wide – According to PEP8, docstrings should break at around 72 characters wide. Some of yours breaks after 20-30 character, whilst other are wider. Others have some indentation due to colons,
:
. It would be better if you followed the standard. - Add spacing after commas – Your code would read a lot easier if you added space after commas in parameters. Looking at
_cullAndRefill()
is almost painful as it is only a block of random characters. - Variable and method naming is
snake_case
, notcamelCase
– This would also help blocks like the previously mentioned_cullAndRefill()
as you would refer tonew_population
orfit_population[]
orself.match_finder()
. - Why
fitness
, notdistance
? – You describe fitness many places as the distance, so why not call it the distance? Fitness is not related to distance, unless you count that someone with great fitness usually can travel a given distance using slightly less energy. - Why
Citizen
, notLocation
? – Maybe it's just me, but I don't get how you connect a citizen to a given location, and these citizen's into population. In other words, I don't get why you've used the terms you've used, and to understand your code I kind of have to translate your terms into something anonymous so that I don't get the conation of the terms you've chosen.
Code review of _cullAndRefill()
So far, it's been mainly style reviews and questions related to your choice of names. Lets review some actual code, and lets focus on the _cullAndRefill()
method:
- Why all the copying of populations? – It seems like you are overly fond of copying populations back and forth, which I don't quite see why.
- Get lengths one time, and reuse – In this short snippet, you do
len(self.population)
orlen(fitPopulation)
quite a lot. Do it once at the start, and reuse it - Testing
self.matchFinder(...) == True
is an anti-pattern - Just test against the result directly. And if the opposite test was called for do anif not self.matchFinder():
. And avoid unnessary parentheses like in the `elif`` - Don't reuse the same loop variables! – In your inner loop you reuse the
i
as the loop variable. This is at best dangerous, and usually fatal as it hides the outer loop variable. In your case it might work as you don't seem to use the outer loop variable, but in that case you should usefor _ in xrange(...)
as you are not really interested in it at all - _
xrange(0, something)
can be written asxrange(something)
– No need to specify the default start of 0, as it does that automagically - Strange addition of a new
Citizen
to the population – After appending a newCitizen
, you continue changing the route of the lastly addedCitizen
using...[-1].route = ...
. Why don't you add the route when creating the citizen? Look into the
for ... else
construction – Your double for loops seems to choose two citizen, and if they match you do some magic and break out of the loop. However if the secondfor
loop can't find a match, you check if you're on the last loop. That can better be written as:for i in xrange(10): if i == 12: break else: print("It never happened")
- Add vertical space – The PEP8 style guidelines dictates to use 2 newlines between methods, but it is also recommended to add some newlines in blocks as well. I.e. before
for
loops, orif
statements, I usually add a newline to help readability.
OK, let us try to combine all of this
middle = len(self.route)/2
population_size = len(self.population)
fit_population = self.population[:int(math.floor(population_size * self.retention))]
fit_population_size = len(fit_population)
new_population = fit_population[:]
for _ in xrange(population_size - fit_population_size):
first_citizen = fit_population[random.randint(0, fit_population_size - 1)]
for _ in xrange(10 * fit_population_size):
second_citizen = fit_population[random.randint(0, fit_population_size - 1)]
if self.match_finder(first_citizen, second_citizen, middle):
new_population.append(Citizen(first_citizen.route[:middle] +
second_citizen.route[middle:]))
break
else:
newPopulation.append(Citizen(first_citizen.route))
self.population = newPopulation[:]
This should do the same as your code, but in my world it easier to understand now due to following the guidelines. Not that besides moving the storing of the route into the creation of the Citizen
, I've not done any algorithmic changes.
Most likely, one could do simplify some of your logic using list comprehension, but that is up to someone else to look into!
Nitpicks/Tips to Cut Down on Verbosity
xrange(0,10)
can be xrange(10)
. If only one argument is specified, the zero is implicit. Some people like putting the zero explicitly. I don't mind either way, just want to make sure you know your options. Also, I tend to like range
more than xrange
because I think that's more readable (but less efficient in Python 2).
fitPopulation[random.randint(0,len(fitPopulation)-1)]
can be simplified to random.choice(fitPopulation)
. Batteries included!
self.population = sorted(self.population, key=lambda citizen: citizen.fitness)
can/should be self.population.sort()
. Specifically, sort
reads better and is more efficient: it destructively modifies your list in-place instead of creating a copy--just fine because in the sorted
version you replace the original list anyway. But wait! Wasn't that key
important? The members of that list don't know how to sort themselves! Not yet, but see below--you should implement __cmp__
for the class that describes the population's members.
Reorganization - Uncoupling the Genetic Algorithm from Population Details
I had a good time reading your code. It was pretty understandable. I think you've written all the critical lines of code that need to be written, but your design could be much stronger if you reorganize.
You have two classes with the following methods:
class Citizen(object):
__init__
class TravelingSalesmen(object):
__init__
__str__
generate
constructRoute
_createPopulation
_repopulate
_organizePopulation
_cullAndRefill
_mutate
_fitness
_rankPopulation
matchFinder
Naming is important, and I think you generally did a good job naming variables and functions, but I hate those class names. Citizen
isn't a citizen. It's a member of the population you're testing. It's really a Route
. You can keep the idea that internally it has a member called _route
which is a plain Python list, but the important thing is that it is internal. Your TravelingSalesmen
class, which is really an environment for running your genetic algorithm, shouldn't need to be aware of how the route is implemented.
Big picture, I propose you have the following classes and methods:
class Route(object):
__init__
__cmp__(self, other)
fitness
mutate
crossover(self, other)
class GeneticExperiment(object):
__init__
evolve
_sortPopulation
_cull
_repopulate
_crossover_all
_mutate_all
run
This may seem radically different, but it's not. You just need to move lines of code around. For example, the logic of mutation is sound, but it should be split between the Route
and GeneticExperiment
's _mutate_all
method. _mutate_all
iterates through population and decides which members will be mutated (according to its own internal mutation rate). The Route
implements the actual mutation, but doesn't need to know anything about mutation rate or chance; you tell it to mutate
, and it does.
In a similar way, GeneticExperiment
issues the command to sort the population, but by implementing a __cmp__
function within Route
(which would directly depend on its own fitness
function), the GeneticExperiment
class doesn't need to know anything about its population members.
Why bother decoupling?
Some people will tell you the biggest advantage is code reuse. If you switch to my suggested design, you could use the same GeneticExperiment
class to a different population type, as long as the new type had specifications for mutation, crossover, and sorting. That's cool.
In my opinion, the biggest advantage of decoupling is the ease with which you can build and test small pieces of functionality. I find it much easier to develop, debug, and verify my code if I can instantiate small pieces. For instance, you can verify that your mutation code is working by instantiating a Route
and mutate
ing it directly.
Implications and challenges
One "improvement" I implicitly suggested in my outline of GeneticExperiment
is the ability to step forward a generation (I called that evolve
). The run
command would be "loop until num_generations met." So run
when you just want the answer, but evolve
when you want to step forward generation by generation. If you combine that technique with a beefier __str__
function, you could get nice, abstract details about the state of the experiment:
>>> my_experiment.evolve()
>>> print my_experiment
<Generation 17 of 1000, Best: 47.8, Avg: 188.9>
>>> my_experiment.evolve()
>>> print my_experiment
<Generation 18 of 1000, Best: 45.2, Avg: 188.8>
>>> my_experiment.run()
>>> print my_experiment
<Generation 1000 of 1000, Best: 38.1, Avg: 77.2>
I think the biggest challenge is reworking the way the you get/generate your initial population. constructRoute
and createPopulation
read well but but create a lot of entanglement between algorithm and population type.
Additionally, one other trouble spot might be when you're doing crossover: matchFinder
's logic will largely be moved to Route
's crossover
. I suggest that the experiment selects two members of the population, and returns False
if they cannot be successfully crossed. The experiment can then decide if it wants to reattempt with another randomly selected pair.
Explore related questions
See similar questions with these tags.