Breadth-first search: traditional and bidirectional in Python

Question 1

I have this Python (3.4) script for demonstrating the performance of the BFS algorithm, both traditional and bidirectional:

#! /usr/bin/env python3.4
import time
import random
from collections import deque
__author__ = 'Rodion "rodde" Efremov'
class DirectedGraphNode:
 def __init__(self, name):
 if name is None:
 raise ValueError("The name of a new node is None.")
 self.name = name
 self.children = set()
 self.parents = set()
 def __hash__(self):
 return self.name.__hash__()
 def __eq__(self, other):
 return self.name == other.name
 def __str__(self):
 return "[DirectedGraphNode: " + self.name + "]"
 def add_child(self, child):
 self.children.add(child)
 child.parents.add(self)
 def has_child(self, child_candidate):
 return child_candidate in self.children
 def get_children(self):
 return self.children
 def get_parents(self):
 return self.parents
def traceback_path(target, parents):
 path = []
 current = target
 while current:
 path.append(current)
 current = parents[current]
 return list(reversed(path))
def bi_traceback_path(touch_node, parents_a, parents_b):
 path = []
 current = touch_node
 while current:
 path.append(current)
 current = parents_a[current]
 path = list(reversed(path))
 current = parents_b[touch_node]
 while current:
 path.append(current)
 current = parents_b[current]
 return path
def breadth_first_search(source, target):
 queue = deque([source])
 parents = {source: None}
 while len(queue) > 0:
 current = queue.popleft()
 if current is target:
 return traceback_path(target, parents)
 for child in current.get_children():
 if child not in parents.keys():
 parents[child] = current
 queue.append(child)
 return []
def bidirectional_breadth_first_search(source, target):
 queue_a = deque([source])
 queue_b = deque([target])
 parents_a = {source: None}
 parents_b = {target: None}
 distance_a = {source: 0}
 distance_b = {target: 0}
 # best_cost is ugly
 best_cost = 1000000000
 touch_node = None
 while len(queue_a) > 0 and len(queue_b) > 0:
 dist_a = distance_a[queue_a[0]]
 dist_b = distance_b[queue_b[0]]
 if touch_node and best_cost < dist_a + dist_b:
 return bi_traceback_path(touch_node,
 parents_a,
 parents_b)
 # Trivial load balancing
 if dist_a < dist_b:
 current = queue_a.popleft()
 if current in distance_b.keys() and best_cost > dist_a + dist_b:
 best_cost = dist_a + dist_b
 touch_node = current
 for child in current.get_children():
 if child not in distance_a.keys():
 distance_a[child] = distance_a[current] + 1
 parents_a[child] = current
 queue_a.append(child)
 else:
 current = queue_b.popleft()
 if current in distance_a.keys() and best_cost > dist_a + dist_b:
 best_cost = dist_a + dist_b
 touch_node = current
 for parent in current.get_parents():
 if parent not in distance_b.keys():
 distance_b[parent] = distance_b[current] + 1
 parents_b[parent] = current
 queue_b.append(parent)
 return []
def create_directed_graph(nodes, edges):
 graph = []
 for i in range(0, nodes):
 node = DirectedGraphNode("" + str(i))
 graph.append(node)
 for i in range(0, edges):
 j = random.randint(0, nodes - 1)
 k = random.randint(0, nodes - 1)
 graph[j].add_child(graph[k])
 return graph
def main():
 graph = create_directed_graph(300000, 1000000)
 source = random.choice(graph)
 target = random.choice(graph)
 print("Source: ", source)
 print("Target: ", target)
 start_time = time.time()
 path1 = breadth_first_search(source, target)
 end_time = time.time()
 print("BFS path length", len(path1), ":")
 for node in path1:
 print(node)
 print("Time elapsed:", 1000.0 * (end_time - start_time), "milliseconds.")
 print()
 start_time = time.time()
 path2 = bidirectional_breadth_first_search(source, target)
 end_time = time.time()
 print("BiBFS path length", len(path2), ":")
 for node in path2:
 print(node)
 print("Time elapsed:", 1000.0 * (end_time - start_time), "milliseconds.")
if __name__ == "__main__":
 main()

Please tell me anything that comes to mind! Am I doing it the right way?

Question 2

One high-level comment:

No comments or docstrings! It’s very difficult to evaluate whether this code is correct, because I don’t know what the code is supposed to do. Adding some comments that relate the code back to the original problem will make it easier to read, review and maintain.

Some specific comments:

Your DirectedGraphNode class should have a __repr__(). You should add one of these on every class you write: it makes things much easier for debugging.
```
def __repr__(self):
 return '%s(%r)' % (self.__class__.__name__, self.name)
```
Note the use of self.__class__.__name__, which makes this method robust to being subclassed.
It’s more Pythonic to access attributes directly than to use getters. (Likewise for setters.) Specifically, the two methods get_children() and get_parents() should be removed from DirectedGraphNode, and any calls to them replaced with direct attribute access.
For the traceback path, you could construct it in the right order as you go along. Rather than using list.append(), you could use list.insert() to put items at the start of the list. This saves you reversing it later, and means that it gets built up in the right order. Consider:
```
def traceback_path(target, parents):
 path = []
 current = target
 while current:
 path.insert(0, current)
 current = parents[current]
 return path
```
Don’t repeat yourself in the traceback functions. You construct a path from parent_a, then another path from parent_b. Roughly speaking, it looks something like:
```
traceback_path(touch_node, parents_a) + 
 reversed(traceback_path(parents_b[touch_node], parents_b)
```
You should be able to express it in a fairly compact form like this, without repeating the internal logic of traceback_path().
Take advantage of Python’s implicit truthiness. In breadth_first_search(), the more Pythonic approach to checking if the queue is non-empty is to use:
```
while queue:
 # do stuff with a non-empty queue
```
You can make similar tidy-ups in bidirectional_breadth_first_search().
Don’t repeat yourself in different but similar branches. The load balancing if/else branches in bidirectional() are very similar. They’re not quite identical, but you should be able to reduce some of the repetition here.
You can tidy up create_directed_graph(). A few quick ones here:
- Since range() defaults to starting from 0, you can drop the first argument to reduce visual noise.
- This is a common anti pattern:
```
graph = []
for i in range(nodes):
 graph.append(DirectedGraphNode(str(i))
```
  The better approach is to use a list comprehension:
```
graph = [DirectedGraphNode(str(i)) for i in range(nodes)]
```
  That’s both cleaner and more Pythonic.
- The variable names sound like they’re a list of nodes and edges, respectively, whereas they’re treated like numbers. Would node_count and edge_count be better names?
- In the second loop, you don’t use the value of the loop variable i. A common approach is to use an underscore (_) for unused loop variables, like so:
```
for _ in range(edge_count):
 # do stuff
```
  This helps emphasise that the exact value of this variable is unimportant.

Question 3

Do not repeat yourself

In #trivial load balancing, the two blocks of code are almost identical, just write on top

queue = (queue_a if dist_a < dist_b else queue_b).popleft()

And merge the two blocks replacing queue_a with queue in the first block and removing the second.

Question 4

The two expansion sections are asymmetrical in data structures they manipulate.

alexwlchan alexwlchan 8,6751 gold badge23 silver badges55 bronze badges · Accepted Answer · 2015-11-28 22:59:04Z

One high-level comment:

No comments or docstrings! It’s very difficult to evaluate whether this code is correct, because I don’t know what the code is supposed to do. Adding some comments that relate the code back to the original problem will make it easier to read, review and maintain.

Some specific comments:

Your DirectedGraphNode class should have a __repr__(). You should add one of these on every class you write: it makes things much easier for debugging.
```
def __repr__(self):
 return '%s(%r)' % (self.__class__.__name__, self.name)
```
Note the use of self.__class__.__name__, which makes this method robust to being subclassed.
It’s more Pythonic to access attributes directly than to use getters. (Likewise for setters.) Specifically, the two methods get_children() and get_parents() should be removed from DirectedGraphNode, and any calls to them replaced with direct attribute access.
For the traceback path, you could construct it in the right order as you go along. Rather than using list.append(), you could use list.insert() to put items at the start of the list. This saves you reversing it later, and means that it gets built up in the right order. Consider:
```
def traceback_path(target, parents):
 path = []
 current = target
 while current:
 path.insert(0, current)
 current = parents[current]
 return path
```
Don’t repeat yourself in the traceback functions. You construct a path from parent_a, then another path from parent_b. Roughly speaking, it looks something like:
```
traceback_path(touch_node, parents_a) + 
 reversed(traceback_path(parents_b[touch_node], parents_b)
```
You should be able to express it in a fairly compact form like this, without repeating the internal logic of traceback_path().
Take advantage of Python’s implicit truthiness. In breadth_first_search(), the more Pythonic approach to checking if the queue is non-empty is to use:
```
while queue:
 # do stuff with a non-empty queue
```
You can make similar tidy-ups in bidirectional_breadth_first_search().
Don’t repeat yourself in different but similar branches. The load balancing if/else branches in bidirectional() are very similar. They’re not quite identical, but you should be able to reduce some of the repetition here.
You can tidy up create_directed_graph(). A few quick ones here:
- Since range() defaults to starting from 0, you can drop the first argument to reduce visual noise.
- This is a common anti pattern:
```
graph = []
for i in range(nodes):
 graph.append(DirectedGraphNode(str(i))
```
  The better approach is to use a list comprehension:
```
graph = [DirectedGraphNode(str(i)) for i in range(nodes)]
```
  That’s both cleaner and more Pythonic.
- The variable names sound like they’re a list of nodes and edges, respectively, whereas they’re treated like numbers. Would node_count and edge_count be better names?
- In the second loop, you don’t use the value of the loop variable i. A common approach is to use an underscore (_) for unused loop variables, like so:
```
for _ in range(edge_count):
 # do stuff
```
  This helps emphasise that the exact value of this variable is unimportant.

Stack Exchange Network

Breadth-first search: traditional and bidirectional in Python

2 Answers 2

Your Answer

Sign up or log in

Post as a guest

Post as a guest

Linked

Hot Network Questions

Breadth-first search: traditional and bidirectional in Python

2 Answers 2

Your Answer

Sign up or log in

Post as a guest

Post as a guest

Linked

Related

Hot Network Questions