Graphs and Kruskal's Algorithm - Python OOP

Question 1

This is my implementation of a graph and Kruskal's algorithm in Python. I wanted to design the graph module myself and I would like some feedback on the design. I tried to follow SOLID throughout. I am not sure if the separate Vertex object is wise, but I feel it could be useful as I expand this module.

I had a copy of a flowchart of Kruskal's algorithm from a textbook (not my current course) and I decided to implement it, I am wondering how Pythonic my code is.

I have also programmed Prim's algorithm in the same file but I will split it over two questions.

class Vertex:
 def __init__(self, name):
 self.name = name
 def __str__(self):
 return f"Vertex {self.name}"
class Edge:
 def __init__(self, start, end, weight):
 self.start = start
 self.end = end
 self.weight = weight
 def __str__(self):
 return f"{self.start}{self.end}"
class Graph:
 def __init__(self, v, e):
 self.vertices = v
 self.edges = e
 def vertex_from_name(self, name):
 """ Return vertex object given vertex name. """
 return next((v for v in self.vertices if v.name == name), None)
 def add_edge(self, start, end, weight):
 """ Add an edge connecting two vertices. Arguments can either be vertex name or vertex object. """
 if isinstance(start, str):
 start = self.vertex_from_name(start)
 if isinstance(end, str):
 end = self.vertex_from_name(end)
 self.edges.append(Edge(start, end, weight))
 def edge_on_vertex(self, v):
 """ Return edges connected to given vertex v."""
 return [e for e in self.edges if (e.start == v) or (e.end == v)]
 def connected_vertices(self, v):
 """ Return the vertices connected to argument v."""
 if isinstance(v, str):
 v = self.vertex_from_name(v)
 return [e.start for e in self.edges if e.end == v] + [e.end for e in self.edges if e.start == v]
 def union(self, lst, e1, e2):
 """ Given a list of lists, merges e1 root list with e2 root list and returns merged list."""
 xroot, yroot = [], []
 # Find roots of both elements
 for i in lst:
 if e1 in i:
 xroot = i
 if e2 in i:
 yroot = i
 # Same root, cannot merge
 if xroot == yroot:
 return False
 xroot += yroot
 lst.remove(yroot)
 return lst
 def is_cycle(self):
 """ Return if the graph contains a cycle. """
 self.sets = [[v] for v in self.vertices]
 self._edges = sorted(self.edges, key=lambda x: x.weight)
 for e in self._edges:
 _temp = self.union(self.sets, e.start, e.end)
 if _temp == False:
 return True
 else:
 self.sets = _temp
 return False
 def Kruskals(self):
 """ Return MST using Kruskal's algorithm. """
 self.tree = Graph([], [])
 self.tree.vertices = self.vertices
 self.sorted_edges = sorted(self.edges, key=lambda x: x.weight)
 self.tree.edges.append(self.sorted_edges.pop(0))
 for edge in self.sorted_edges:
 self.tree.edges.append(edge)
 if self.tree.is_cycle():
 self.tree.edges.remove(edge)
 return self.tree
if __name__ == "__main__":
 v = [Vertex(x) for x in ["A", "B", "C", "D", "E", "F"]]
 g = Graph(v, [])
 g.add_edge("A", "B", 9)
 g.add_edge("A", "C", 12)
 g.add_edge("A", "D", 9)
 g.add_edge("A", "E", 11)
 g.add_edge("A", "F", 8)
 g.add_edge("B", "C", 10)
 g.add_edge("B", "F", 15)
 g.add_edge("C", "D", 8)
 g.add_edge("D", "E", 14)
 g.add_edge("E", "F", 12)
 print(g.Kruskals().edges)

Question 2

Type hints

def __init__(self, start, end, weight):

can be

def __init__(self, start: Vertex, end: Vertex, weight: float):

depending on a few things, including the order of declaration of your classes, Vertex might need to be 'Vertex' here.

For another example, this

def vertex_from_name(self, name):

can turn into

def vertex_from_name(self, name: str) -> Vertex:

Efficient lookups

To make this more efficient:

 return next((v for v in self.vertices if v.name == name), None)

Consider maintaining a string-to-Vertex dictionary to reduce this lookup from O(n) to O(1) in time.

Premature materialization

These:

 return [e for e in self.edges if (e.start == v) or (e.end == v)]
 return [e.start for e in self.edges if e.end == v] + [e.end for e in self.edges if e.start == v]

require that the entire results be stored to an in-memory list. To return the generator directly and reduce this memory requirement, the first one can be

 return (e for e in self.edges if v in {e.start, e.end})

and the second one can be

yield from (e.start for e in self.edges if e.end == v)
yield from (e.end for e in self.edges if e.start == v)

Set-membership tests

This:

""" Given a list of lists, merges e1 root list with e2 root list and returns merged list."""

is probably better-expressed as accepting a list of set, not a list of lists. That will make these two tests:

 if e1 in i:
 xroot = i
 if e2 in i:
 yroot = i

faster. This:

 self.sets = [[v] for v in self.vertices]

would then become

 self.sets = [{v} for v in self.vertices]

Strings as iterables

This

v = [Vertex(x) for x in ["A", "B", "C", "D", "E", "F"]]

can be

v = [Vertex(x) for x in 'ABCDEF']

Convenience functions

Consider making a convenience function to turn this

g.add_edge("A", "B", 9)
g.add_edge("A", "C", 12)
g.add_edge("A", "D", 9)
g.add_edge("A", "E", 11)
g.add_edge("A", "F", 8)
g.add_edge("B", "C", 10)
g.add_edge("B", "F", 15)
g.add_edge("C", "D", 8)
g.add_edge("D", "E", 14)
g.add_edge("E", "F", 12)

into

g.add_edges(
 ("A", "B", 9),
 ("A", "C", 12),
 ("A", "D", 9),
 ("A", "E", 11),
 ("A", "F", 8),
 ("B", "C", 10),
 ("B", "F", 15),
 ("C", "D", 8),
 ("D", "E", 14),
 ("E", "F", 12),
)

Question 3

Temporary state in `self`

is_cycle leaves sets and _edges in self. Not as a cache or as a result, but as temporary state which then leaks out, that is generally seen as a bad thing.

Kruskals leaves the tree in self, that's a bit more useful, but it could also be considered temporary state in self.

The Algorithm

union doesn't look like an implementation of Union-Find to me. "Ask all sets whether the element is in them" is not how Find normally works. Despite that, it looks like something that reasonably should work, just slower.

The way is_cycle works means that the disjoint sets are built up from scratch (and the edges are re-sorted) every time is_cycle is called. That's wasteful, instead of rebuilding them from scratch, the sets could be kept up to date by unioning them as the main algorithm goes along. Calling is_cycle at all is wasteful: it loops over all edges, but the cycle could have been detected even before creating it by testing Find(edge.start) != Find(edge.end) in the main algorithm (Kruskals), which is how the pseudocode on Wikipedia does it.

Overall I think that makes the current algorithm take O(E2V log E) time, instead of O(E log E). Maybe not quite that, I just took the worst cases of all the nested loops, I didn't look at the effect of the number of sets decreasing as the algorithm progresses.

Question 4

Thanks for the review. I will have to have a deeper look at Union-find and disjoint sets. One question: for the big O, what does V represent? Is it just another variable, independent of E?

Question 5

@EugeneProut V is the number of vertices, E the number of edges. E cannot be greater than V² so they're not totally independent (every E could be replaced with V²), but using both V and E in the complexity analysis gives some idea of how the algorithm reacts to sparse graphs

Reinderien Reinderien 71k5 gold badges76 silver badges256 bronze badges · Accepted Answer · 2020-04-11 18:30:34Z

Type hints

def __init__(self, start, end, weight):

can be

def __init__(self, start: Vertex, end: Vertex, weight: float):

depending on a few things, including the order of declaration of your classes, Vertex might need to be 'Vertex' here.

For another example, this

def vertex_from_name(self, name):

can turn into

def vertex_from_name(self, name: str) -> Vertex:

Efficient lookups

To make this more efficient:

 return next((v for v in self.vertices if v.name == name), None)

Consider maintaining a string-to-Vertex dictionary to reduce this lookup from O(n) to O(1) in time.

Premature materialization

These:

 return [e for e in self.edges if (e.start == v) or (e.end == v)]
 return [e.start for e in self.edges if e.end == v] + [e.end for e in self.edges if e.start == v]

require that the entire results be stored to an in-memory list. To return the generator directly and reduce this memory requirement, the first one can be

 return (e for e in self.edges if v in {e.start, e.end})

and the second one can be

yield from (e.start for e in self.edges if e.end == v)
yield from (e.end for e in self.edges if e.start == v)

Set-membership tests

This:

""" Given a list of lists, merges e1 root list with e2 root list and returns merged list."""

is probably better-expressed as accepting a list of set, not a list of lists. That will make these two tests:

 if e1 in i:
 xroot = i
 if e2 in i:
 yroot = i

faster. This:

 self.sets = [[v] for v in self.vertices]

would then become

 self.sets = [{v} for v in self.vertices]

Strings as iterables

This

v = [Vertex(x) for x in ["A", "B", "C", "D", "E", "F"]]

can be

v = [Vertex(x) for x in 'ABCDEF']

Convenience functions

Consider making a convenience function to turn this

g.add_edge("A", "B", 9)
g.add_edge("A", "C", 12)
g.add_edge("A", "D", 9)
g.add_edge("A", "E", 11)
g.add_edge("A", "F", 8)
g.add_edge("B", "C", 10)
g.add_edge("B", "F", 15)
g.add_edge("C", "D", 8)
g.add_edge("D", "E", 14)
g.add_edge("E", "F", 12)

into

g.add_edges(
 ("A", "B", 9),
 ("A", "C", 12),
 ("A", "D", 9),
 ("A", "E", 11),
 ("A", "F", 8),
 ("B", "C", 10),
 ("B", "F", 15),
 ("C", "D", 8),
 ("D", "E", 14),
 ("E", "F", 12),
)

Stack Exchange Network

Graphs and Kruskal's Algorithm - Python OOP

2 Answers 2

Type hints

Efficient lookups

Premature materialization

Set-membership tests

Strings as iterables

Convenience functions

Temporary state in `self`

The Algorithm

Your Answer

Sign up or log in

Post as a guest

Post as a guest

Linked

Hot Network Questions

Graphs and Kruskal's Algorithm - Python OOP

2 Answers 2

Type hints

Efficient lookups

Premature materialization

Set-membership tests

Strings as iterables

Convenience functions

Temporary state in self

The Algorithm

Your Answer

Sign up or log in

Post as a guest

Post as a guest

Linked

Related

Hot Network Questions

Temporary state in `self`