Optimize BFS weighted search in python

Question 1

I have a weighted BFS search in python for networkx that I want to optimize:

def bfs(graph, node, attribute, max = 10000):
 # cProfile shows this taking up the most time. Maybe I could use heapq directly? Or maybe I would get more of a benefit making it multithreaded?
 queue = Queue.PriorityQueue()
 num = 0
 done = set()
 for neighbor, attrs in graph[node].iteritems():
 # The third element is needed because we want to return the ajacent node towards the target, not the target itself.
 queue.put((attrs['weight'], neighbor, neighbor))
 while num <= max and not queue.empty():
 num += 1
 item = queue.get()
 if graph.node[item[1]].get(attribute, False):
 return item[2]
 for neighbor, attrs in graph[item[1]].iteritems():
 if neighbor not in done:
 queue.put((item[0] + attrs['weight'], neighbor, item[2]))
 done.add(neighbor)

Question 2

Definitely use heapq directly when you don't need the thread synchronization that Queue.PriorityQueue provides.

Question 3

Right, but should I go threading or heapq?

Question 4

@fread2281, no reason to use threading here

Question 5

Some things that might improve your code:

Use heapq rather than Queue.PriorityQueue. The latter does synchronization stuff that you don't need, so heapq will be faster.
Don't add neighbor to done in your inner loop. This will find incorrect routes on any graphs where some subpath A->B->C is shorter than A->C. Instead, you want to be be putting item[1] into done when you pop it off the queue. Since this means you will get some duplicated nodes in your queue, you'll also need to check if the newly popped node was already done before processing it. You might be able to save a little processing time by storing the cost of the path to the node in a dictionary, and only adding a new path if it's less than the previous one (but I don't think the improvement is very large).
This is a readability improvement (not a performance or correctness fix), but I suggest unpacking item into several values, rather than indexing it. cost, node, first_step = queue.get(), or something similar.
A terminology suggestion: Rather than being a basic breadth-first search, this is Dijkstra's Algorithm. If all the edge weights are equal, you'll examine nodes (roughly) the same order as in a BFS, but if you have different weights you'll go in a different order. It might be a good idea to use a function name that reflect what you're actually doing in the code.

Question 6

I understand the reason for you to implement your own search routine instead of using what is readily available is because you want to return the node that is the first step toward the target instead of the target itself. Before presenting my code, I would like to raise a couple of suggestions:

Use better descriptive names

Names like max, num are not meaningful. I suggest to use names such as max_visiting_nodes, visit_count instead.

Similarly, item[1], item[2], ... do not mean much. Instead of item = queue.get(), how about something like weigh, target_node, first_step_node = queue.get()

Correctness.

If you have a graph like this:

graph1

The red colors are the weight. If search starts with node 1 and node 2 has the attribute you want, the first step toward the target would be node 5, since the path 1-5-4-3-2 is shorter (weight = 4), instead of 1-2 (weight = 10). Your algorithm returns 2, not 5.

Propose solution

I propose using what networkx offering, meaning bfs_tree() to find a list of nodes, then shortest_path():

def bfs_find_first_step(graph, node, attribute, max_visiting_nodes = 10000):
 for visit_count, visiting_node in enumerate(nx.bfs_tree(graph, node)):
 if visit_count == max_visiting_nodes: break
 if graph.node[visiting_node].get(attribute):
 shortest_path = nx.shortest_path(graph, node, visiting_node, 'weight')
 if len(shortest_path) == 1:
 return node # Only current node qualifies
 else:
 return shortest_path[1] # First step toward the target
 return None

A few notes:

In my solution, I use the visit_count (AKA num) to limit the search time, the same way you would like to.
the nx.shortest_path() function takes in as the fourth parameter, the string name of the edge attribute which used for calculating the weight. I think this is what you have in mind.
If you start search from node 1, and only node 1 has the attribute your want, your function returns 5, whereas mine returns 1. I think mine is more correct, but I am not sure--it might not be what you want. This case is when len(shortest_path) == 1 in my code.

Question 7

Wow. A few things: bfs_tree is slow compared to my method for large networks and small max_visiting_nodes (and maybe also my optimized version from the other answer for all networks). Also your version iterates at least twice (at least once in bfs_tree) and that is unnecessary. When I get some time, I may go into netowrkx and make a version of A* for search. Also bfs_search is not what I am doing, highlighted my the other answer, so bfs_tree is not correct here. Your answer is BFS and does not really use shortest_path for deciding what node to return (it gets first instead).

Question 8

Also your solution will give 1 by 1->2 (it does not matter here, but for more complex networks it will).

Question 9

I just found out that bfs_tree() does not seem to return the correct sequence. I am doing some tests at this moment.

Question 10

BFS is breadth-first-search, so it returns based on depth, not weight. en.wikipedia.org/wiki/Breadth-first_search

Blckknght BlckknghtBlckknght 3161 silver badge4 bronze badges · Accepted Answer · 2013-03-31 18:22:42Z

Some things that might improve your code:

Use heapq rather than Queue.PriorityQueue. The latter does synchronization stuff that you don't need, so heapq will be faster.
Don't add neighbor to done in your inner loop. This will find incorrect routes on any graphs where some subpath A->B->C is shorter than A->C. Instead, you want to be be putting item[1] into done when you pop it off the queue. Since this means you will get some duplicated nodes in your queue, you'll also need to check if the newly popped node was already done before processing it. You might be able to save a little processing time by storing the cost of the path to the node in a dictionary, and only adding a new path if it's less than the previous one (but I don't think the improvement is very large).
This is a readability improvement (not a performance or correctness fix), but I suggest unpacking item into several values, rather than indexing it. cost, node, first_step = queue.get(), or something similar.
A terminology suggestion: Rather than being a basic breadth-first search, this is Dijkstra's Algorithm. If all the edge weights are equal, you'll examine nodes (roughly) the same order as in a BFS, but if you have different weights you'll go in a different order. It might be a good idea to use a function name that reflect what you're actually doing in the code.

Stack Exchange Network

Optimize BFS weighted search in python

2 Answers 2

Use better descriptive names

Correctness.

Propose solution

Your Answer

Sign up or log in

Post as a guest

Post as a guest

Hot Network Questions

Optimize BFS weighted search in python

2 Answers 2

Use better descriptive names

Correctness.

Propose solution

Your Answer

Sign up or log in

Post as a guest

Post as a guest

Related

Hot Network Questions