This is my implementation of a graph and Kruskal's algorithm in Python. I wanted to design the graph module myself and I would like some feedback on the design. I tried to follow SOLID throughout. I am not sure if the separate Vertex object is wise, but I feel it could be useful as I expand this module.
I had a copy of a flowchart of Kruskal's algorithm from a textbook (not my current course) and I decided to implement it, I am wondering how Pythonic my code is.
I have also programmed Prim's algorithm in the same file but I will split it over two questions.
class Vertex:
def __init__(self, name):
self.name = name
def __str__(self):
return f"Vertex {self.name}"
class Edge:
def __init__(self, start, end, weight):
self.start = start
self.end = end
self.weight = weight
def __str__(self):
return f"{self.start}{self.end}"
class Graph:
def __init__(self, v, e):
self.vertices = v
self.edges = e
def vertex_from_name(self, name):
""" Return vertex object given vertex name. """
return next((v for v in self.vertices if v.name == name), None)
def add_edge(self, start, end, weight):
""" Add an edge connecting two vertices. Arguments can either be vertex name or vertex object. """
if isinstance(start, str):
start = self.vertex_from_name(start)
if isinstance(end, str):
end = self.vertex_from_name(end)
self.edges.append(Edge(start, end, weight))
def edge_on_vertex(self, v):
""" Return edges connected to given vertex v."""
return [e for e in self.edges if (e.start == v) or (e.end == v)]
def connected_vertices(self, v):
""" Return the vertices connected to argument v."""
if isinstance(v, str):
v = self.vertex_from_name(v)
return [e.start for e in self.edges if e.end == v] + [e.end for e in self.edges if e.start == v]
def union(self, lst, e1, e2):
""" Given a list of lists, merges e1 root list with e2 root list and returns merged list."""
xroot, yroot = [], []
# Find roots of both elements
for i in lst:
if e1 in i:
xroot = i
if e2 in i:
yroot = i
# Same root, cannot merge
if xroot == yroot:
return False
xroot += yroot
lst.remove(yroot)
return lst
def is_cycle(self):
""" Return if the graph contains a cycle. """
self.sets = [[v] for v in self.vertices]
self._edges = sorted(self.edges, key=lambda x: x.weight)
for e in self._edges:
_temp = self.union(self.sets, e.start, e.end)
if _temp == False:
return True
else:
self.sets = _temp
return False
def Kruskals(self):
""" Return MST using Kruskal's algorithm. """
self.tree = Graph([], [])
self.tree.vertices = self.vertices
self.sorted_edges = sorted(self.edges, key=lambda x: x.weight)
self.tree.edges.append(self.sorted_edges.pop(0))
for edge in self.sorted_edges:
self.tree.edges.append(edge)
if self.tree.is_cycle():
self.tree.edges.remove(edge)
return self.tree
if __name__ == "__main__":
v = [Vertex(x) for x in ["A", "B", "C", "D", "E", "F"]]
g = Graph(v, [])
g.add_edge("A", "B", 9)
g.add_edge("A", "C", 12)
g.add_edge("A", "D", 9)
g.add_edge("A", "E", 11)
g.add_edge("A", "F", 8)
g.add_edge("B", "C", 10)
g.add_edge("B", "F", 15)
g.add_edge("C", "D", 8)
g.add_edge("D", "E", 14)
g.add_edge("E", "F", 12)
print(g.Kruskals().edges)
2 Answers 2
Type hints
def __init__(self, start, end, weight):
can be
def __init__(self, start: Vertex, end: Vertex, weight: float):
depending on a few things, including the order of declaration of your classes, Vertex
might need to be 'Vertex'
here.
For another example, this
def vertex_from_name(self, name):
can turn into
def vertex_from_name(self, name: str) -> Vertex:
Efficient lookups
To make this more efficient:
return next((v for v in self.vertices if v.name == name), None)
Consider maintaining a string-to-Vertex
dictionary to reduce this lookup from O(n) to O(1) in time.
Premature materialization
These:
return [e for e in self.edges if (e.start == v) or (e.end == v)]
return [e.start for e in self.edges if e.end == v] + [e.end for e in self.edges if e.start == v]
require that the entire results be stored to an in-memory list. To return the generator directly and reduce this memory requirement, the first one can be
return (e for e in self.edges if v in {e.start, e.end})
and the second one can be
yield from (e.start for e in self.edges if e.end == v)
yield from (e.end for e in self.edges if e.start == v)
Set-membership tests
This:
""" Given a list of lists, merges e1 root list with e2 root list and returns merged list."""
is probably better-expressed as accepting a list of set
, not a list of list
s. That will make these two tests:
if e1 in i:
xroot = i
if e2 in i:
yroot = i
faster. This:
self.sets = [[v] for v in self.vertices]
would then become
self.sets = [{v} for v in self.vertices]
Strings as iterables
This
v = [Vertex(x) for x in ["A", "B", "C", "D", "E", "F"]]
can be
v = [Vertex(x) for x in 'ABCDEF']
Convenience functions
Consider making a convenience function to turn this
g.add_edge("A", "B", 9)
g.add_edge("A", "C", 12)
g.add_edge("A", "D", 9)
g.add_edge("A", "E", 11)
g.add_edge("A", "F", 8)
g.add_edge("B", "C", 10)
g.add_edge("B", "F", 15)
g.add_edge("C", "D", 8)
g.add_edge("D", "E", 14)
g.add_edge("E", "F", 12)
into
g.add_edges(
("A", "B", 9),
("A", "C", 12),
("A", "D", 9),
("A", "E", 11),
("A", "F", 8),
("B", "C", 10),
("B", "F", 15),
("C", "D", 8),
("D", "E", 14),
("E", "F", 12),
)
Temporary state in self
is_cycle
leaves sets
and _edges
in self
. Not as a cache or as a result, but as temporary state which then leaks out, that is generally seen as a bad thing.
Kruskals
leaves the tree
in self
, that's a bit more useful, but it could also be considered temporary state in self
.
The Algorithm
union
doesn't look like an implementation of Union-Find to me. "Ask all sets whether the element is in them" is not how Find
normally works. Despite that, it looks like something that reasonably should work, just slower.
The way is_cycle
works means that the disjoint sets are built up from scratch (and the edges are re-sorted) every time is_cycle
is called. That's wasteful, instead of rebuilding them from scratch, the sets could be kept up to date by unioning them as the main algorithm goes along. Calling is_cycle
at all is wasteful: it loops over all edges, but the cycle could have been detected even before creating it by testing Find(edge.start) != Find(edge.end)
in the main algorithm (Kruskals
), which is how the pseudocode on Wikipedia does it.
Overall I think that makes the current algorithm take O(E2V log E) time, instead of O(E log E). Maybe not quite that, I just took the worst cases of all the nested loops, I didn't look at the effect of the number of sets decreasing as the algorithm progresses.
-
\$\begingroup\$ Thanks for the review. I will have to have a deeper look at Union-find and disjoint sets. One question: for the big O, what does V represent? Is it just another variable, independent of E? \$\endgroup\$EugeneProut– EugeneProut2020年04月15日 19:55:33 +00:00Commented Apr 15, 2020 at 19:55
-
1\$\begingroup\$ @EugeneProut V is the number of vertices, E the number of edges. E cannot be greater than V² so they're not totally independent (every E could be replaced with V²), but using both V and E in the complexity analysis gives some idea of how the algorithm reacts to sparse graphs \$\endgroup\$user555045– user5550452020年04月15日 20:06:43 +00:00Commented Apr 15, 2020 at 20:06