Here is my Graph class that implements a graph and has nice a method to generate its spanning tree using Kruskal's algorithm.
I want to:
- Make it pythonic
- Improve readability
- Improve the abstraction (but not changing the use of outer and inner dicts to represent the graph)
Performance is not a concern.
Code:
class Graph(object):
def __init__(self):
self.g = {}
def add(self, vertex1, vertex2, weight):
if vertex1 not in self.g:
self.g[vertex1] = {}
if vertex2 not in self.g:
self.g[vertex2] = {}
self.g[vertex1][vertex2] = weight
self.g[vertex2][vertex1] = weight
def has_link(self, v1, v2):
return v2 in self[v1] or v1 in self[v2]
def edges(self):
data = []
for from_vertex, destinations in self.g.items():
for to_vertex, weight in destinations.items():
if (to_vertex, from_vertex, weight) not in data:
data.append((from_vertex, to_vertex, weight))
return data
def sorted_by_weight(self, desc=False):
return sorted(self.edges(), key=lambda x: x[2], reverse=desc)
def spanning_tree(self, minimum=True):
mst = Graph()
parent = {}
rank = {}
def find_parent(vertex):
while parent[vertex] != vertex:
vertex = parent[vertex]
return vertex
def union(root1, root2):
if rank[root1] > rank[root2]:
parent[root2] = root1
else:
parent[root2] = root1
if rank[root2] == rank[root1]:
rank[root2] += 1
for vertex in self.g:
parent[vertex] = vertex
rank[vertex] = 0
for v1, v2, weight in self.sorted_by_weight(not minimum):
parent1 = find_parent(v1)
parent2 = find_parent(v2)
if parent1 != parent2:
mst.add(v1, v2, weight)
union(parent1, parent2)
if len(self) == len(mst):
break
return mst
def __len__(self):
return len(self.g.keys())
def __getitem__(self, node):
return self.g[node]
def __iter__(self):
for edge in self.edges():
yield edge
def __str__(self):
return "\n".join('from %s to %s: %d' % edge for edge in self.edges())
graph = Graph()
graph.add('a', 'b', 2)
graph.add('a', 'd', 3)
graph.add('a', 'c', 3)
graph.add('b', 'a', 2)
graph.add('b', 'c', 4)
graph.add('b', 'e', 3)
graph.add('c', 'a', 3)
graph.add('c', 'b', 4)
graph.add('c', 'd', 5)
graph.add('c', 'e', 1)
graph.add('d', 'a', 3)
graph.add('d', 'c', 5)
graph.add('d', 'f', 7)
graph.add('f', 'd', 7)
graph.add('f', 'e', 8)
graph.add('f', 'g', 9)
print(graph.spanning_tree())
print()
print(graph.spanning_tree(False))
2 Answers 2
There are no docstrings. How do I use this class? What arguments should I pass to the methods and what do they return?
There's no way to add a vertex with no edges to a graph.
The test code shows that it is quite laborious to initialize a graph. It would make sense for the constructor to take an iterable of edges.
The attribute
g
is not intended for use outside the class (callers should use the public methods). It is conventional to name internal attributes with names starting with one underscore.The code in
add
could be simplified by making use ofcollections.defaultdict
:def __init__(self, edges=()): """Construct a graph containing the edges from the given iterable. An edge between vertices v1 and v2 with weight w should be specified as a tuple (v1, v2, w). """ self._g = defaultdict(dict) for edge in edges: self.add(*edge) def add(self, v1, v2, w): """Add an edge between vertices v1 and v2 with weight w. If an edge already exists between these vertices, set its weight to w. """ self._g[v1][v2] = self._g[v2][v1] = w
The
has_link
method uses "link", but the rest of the code uses "edge". It would be better to be consistent.The
__getitem__
method uses "node", but the rest of the code uses "vertex". It would be better to be consistent.Since
add
ensures that edges are stored in both directed, it's only necessary to test one direction:def has_edge(self, v1, v2): """Return True if the graph has an edge between vertices v1 and v2.""" return v2 in self._g[v1]
The
edges
method accumulates a list of edges. To avoid an edge appearing twice, the code checks each edge to see if it is already in the list. But lists do not have an efficient membership test, so the runtime is quadratic in the number of edges. It would be more efficient to make a set of edges:def edges(self): """Return the edges in the graph as a set of tuples (v1, v2, w).""" edges = set() for v1, destinations in self._g.items(): for v2, w in destinations.items(): if (v2, v1, w) not in edges: edges.add((v1, v2, w)) return edges
In the
sorted_by_weight
method, it would be better to name the keyword argumentreverse
, for consistency with Python's built-insorted
.In
spanning_tree
, theparent
andrank
dictionaries can be initialized directly:parent = {v: v for v in self._g} rank = {v: 0 for v in self._g}
The
parent
data structure is known as a disjoint-set forest. An important optimization on this data structure is path compression: that is, when you search theparent
data structure for the root of the tree containing a vertex, you update theparent
data structure so that future searches run quickly:def find_root(v): if parent[v] != v: parent[v] = find_root(parent[v]) return parent[v]
In the
union
function, you always attach the tree forroot2
to the tree forroot1
:if rank[root1] > rank[root2]: parent[root2] = root1 else: parent[root2] = root1
It's more efficient to always attach the smaller tree to the bigger tree, so that the length of the searches in
find_root
doesn't increase:if rank[root1] > rank[root2]: parent[root2] = root1 else: parent[root1] = root2
The
__len__
and__getitem__
methods treat the graph as a collection of vertices, but__iter__
treats it as a collection of edges. This seems likely to lead to confusion. It would be better to be consistent and have all special methods treat the graph as a collection in the same way.
-
\$\begingroup\$ great suggestions! In (11) your code looks a bit neater at a cost of two loops. Gabriel has a single loop though ;) \$\endgroup\$Oleg Melnikov– Oleg Melnikov2017年04月13日 15:59:41 +00:00Commented Apr 13, 2017 at 15:59
-
\$\begingroup\$ @OlegMelnikov: The idea behind (11) is to improve the clarity of the code — in my version you can see immediately that every vertex is its own parent, and every rank is zero. The extra loop doesn't affect the asymptotic complexity, so I'm not inclined to worry about it. \$\endgroup\$Gareth Rees– Gareth Rees2017年04月13日 17:30:46 +00:00Commented Apr 13, 2017 at 17:30
-
\$\begingroup\$ @ Gareth: Fair enough :) \$\endgroup\$Oleg Melnikov– Oleg Melnikov2017年04月13日 19:23:52 +00:00Commented Apr 13, 2017 at 19:23
Maybe you can use a class to represent edges, like this:
Since Kruskal would sort the edges, this would be more convenient:
class Edge(object): def __init__(self, u, v, w): self.u = u self.v = v self.weight = w def __repr__(self): return '%s -> %s : %s ' %(self.u, self.v, self.weight)
There's a logic hole in this implementation.
You can try this graph:
g = Graph() g.add('a','b',6) g.add('a','c',8) g.add('a','d',8) g.add('b','c',7) g.add('b','d',9) g.add('c','d',4) print(g.spanning_tree())
The result is:
from a to b: 6
from c to d: 4
It's not right since this is not a valid tree.
-
\$\begingroup\$ I also found this graph code didn't produce a valid tree \$\endgroup\$Thirst for Knowledge– Thirst for Knowledge2019年01月07日 11:29:21 +00:00Commented Jan 7, 2019 at 11:29