(See the next iteration.)
I have this Python (3.4) script for demonstrating the performance of the BFS algorithm, both traditional and bidirectional:
#! /usr/bin/env python3.4
import time
import random
from collections import deque
__author__ = 'Rodion "rodde" Efremov'
class DirectedGraphNode:
def __init__(self, name):
if name is None:
raise ValueError("The name of a new node is None.")
self.name = name
self.children = set()
self.parents = set()
def __hash__(self):
return self.name.__hash__()
def __eq__(self, other):
return self.name == other.name
def __str__(self):
return "[DirectedGraphNode: " + self.name + "]"
def add_child(self, child):
self.children.add(child)
child.parents.add(self)
def has_child(self, child_candidate):
return child_candidate in self.children
def get_children(self):
return self.children
def get_parents(self):
return self.parents
def traceback_path(target, parents):
path = []
current = target
while current:
path.append(current)
current = parents[current]
return list(reversed(path))
def bi_traceback_path(touch_node, parents_a, parents_b):
path = []
current = touch_node
while current:
path.append(current)
current = parents_a[current]
path = list(reversed(path))
current = parents_b[touch_node]
while current:
path.append(current)
current = parents_b[current]
return path
def breadth_first_search(source, target):
queue = deque([source])
parents = {source: None}
while len(queue) > 0:
current = queue.popleft()
if current is target:
return traceback_path(target, parents)
for child in current.get_children():
if child not in parents.keys():
parents[child] = current
queue.append(child)
return []
def bidirectional_breadth_first_search(source, target):
queue_a = deque([source])
queue_b = deque([target])
parents_a = {source: None}
parents_b = {target: None}
distance_a = {source: 0}
distance_b = {target: 0}
# best_cost is ugly
best_cost = 1000000000
touch_node = None
while len(queue_a) > 0 and len(queue_b) > 0:
dist_a = distance_a[queue_a[0]]
dist_b = distance_b[queue_b[0]]
if touch_node and best_cost < dist_a + dist_b:
return bi_traceback_path(touch_node,
parents_a,
parents_b)
# Trivial load balancing
if dist_a < dist_b:
current = queue_a.popleft()
if current in distance_b.keys() and best_cost > dist_a + dist_b:
best_cost = dist_a + dist_b
touch_node = current
for child in current.get_children():
if child not in distance_a.keys():
distance_a[child] = distance_a[current] + 1
parents_a[child] = current
queue_a.append(child)
else:
current = queue_b.popleft()
if current in distance_a.keys() and best_cost > dist_a + dist_b:
best_cost = dist_a + dist_b
touch_node = current
for parent in current.get_parents():
if parent not in distance_b.keys():
distance_b[parent] = distance_b[current] + 1
parents_b[parent] = current
queue_b.append(parent)
return []
def create_directed_graph(nodes, edges):
graph = []
for i in range(0, nodes):
node = DirectedGraphNode("" + str(i))
graph.append(node)
for i in range(0, edges):
j = random.randint(0, nodes - 1)
k = random.randint(0, nodes - 1)
graph[j].add_child(graph[k])
return graph
def main():
graph = create_directed_graph(300000, 1000000)
source = random.choice(graph)
target = random.choice(graph)
print("Source: ", source)
print("Target: ", target)
start_time = time.time()
path1 = breadth_first_search(source, target)
end_time = time.time()
print("BFS path length", len(path1), ":")
for node in path1:
print(node)
print("Time elapsed:", 1000.0 * (end_time - start_time), "milliseconds.")
print()
start_time = time.time()
path2 = bidirectional_breadth_first_search(source, target)
end_time = time.time()
print("BiBFS path length", len(path2), ":")
for node in path2:
print(node)
print("Time elapsed:", 1000.0 * (end_time - start_time), "milliseconds.")
if __name__ == "__main__":
main()
Please tell me anything that comes to mind! Am I doing it the right way?
2 Answers 2
One high-level comment:
- No comments or docstrings! It’s very difficult to evaluate whether this code is correct, because I don’t know what the code is supposed to do. Adding some comments that relate the code back to the original problem will make it easier to read, review and maintain.
Some specific comments:
Your DirectedGraphNode class should have a __repr__(). You should add one of these on every class you write: it makes things much easier for debugging.
def __repr__(self): return '%s(%r)' % (self.__class__.__name__, self.name)
Note the use of
self.__class__.__name__
, which makes this method robust to being subclassed.It’s more Pythonic to access attributes directly than to use getters. (Likewise for setters.) Specifically, the two methods
get_children()
andget_parents()
should be removed from DirectedGraphNode, and any calls to them replaced with direct attribute access.For the traceback path, you could construct it in the right order as you go along. Rather than using
list.append()
, you could uselist.insert()
to put items at the start of the list. This saves you reversing it later, and means that it gets built up in the right order. Consider:def traceback_path(target, parents): path = [] current = target while current: path.insert(0, current) current = parents[current] return path
Don’t repeat yourself in the traceback functions. You construct a path from parent_a, then another path from parent_b. Roughly speaking, it looks something like:
traceback_path(touch_node, parents_a) + reversed(traceback_path(parents_b[touch_node], parents_b)
You should be able to express it in a fairly compact form like this, without repeating the internal logic of
traceback_path()
.Take advantage of Python’s implicit truthiness. In
breadth_first_search()
, the more Pythonic approach to checking if the queue is non-empty is to use:
You can make similar tidy-ups inwhile queue: # do stuff with a non-empty queue
bidirectional_breadth_first_search()
.Don’t repeat yourself in different but similar branches. The load balancing if/else branches in
bidirectional()
are very similar. They’re not quite identical, but you should be able to reduce some of the repetition here.You can tidy up create_directed_graph(). A few quick ones here:
- Since
range()
defaults to starting from 0, you can drop the first argument to reduce visual noise. This is a common anti pattern:
graph = [] for i in range(nodes): graph.append(DirectedGraphNode(str(i))
The better approach is to use a list comprehension:
graph = [DirectedGraphNode(str(i)) for i in range(nodes)]
That’s both cleaner and more Pythonic.
- The variable names sound like they’re a list of nodes and edges, respectively, whereas they’re treated like numbers. Would
node_count
andedge_count
be better names? In the second loop, you don’t use the value of the loop variable
i
. A common approach is to use an underscore (_
) for unused loop variables, like so:
This helps emphasise that the exact value of this variable is unimportant.for _ in range(edge_count): # do stuff
- Since
Do not repeat yourself
In #trivial load balancing
, the two blocks of code are almost identical, just write on top
queue = (queue_a if dist_a < dist_b else queue_b).popleft()
And merge the two blocks replacing queue_a
with queue
in the first block and removing the second.
-
\$\begingroup\$ The two expansion sections are asymmetrical in data structures they manipulate. \$\endgroup\$coderodde– coderodde2016年03月31日 17:16:44 +00:00Commented Mar 31, 2016 at 17:16
Explore related questions
See similar questions with these tags.