I have a weighted BFS search in python for networkx that I want to optimize:
def bfs(graph, node, attribute, max = 10000):
# cProfile shows this taking up the most time. Maybe I could use heapq directly? Or maybe I would get more of a benefit making it multithreaded?
queue = Queue.PriorityQueue()
num = 0
done = set()
for neighbor, attrs in graph[node].iteritems():
# The third element is needed because we want to return the ajacent node towards the target, not the target itself.
queue.put((attrs['weight'], neighbor, neighbor))
while num <= max and not queue.empty():
num += 1
item = queue.get()
if graph.node[item[1]].get(attribute, False):
return item[2]
for neighbor, attrs in graph[item[1]].iteritems():
if neighbor not in done:
queue.put((item[0] + attrs['weight'], neighbor, item[2]))
done.add(neighbor)
2 Answers 2
Some things that might improve your code:
Use
heapq
rather thanQueue.PriorityQueue
. The latter does synchronization stuff that you don't need, soheapq
will be faster.Don't add
neighbor
todone
in your inner loop. This will find incorrect routes on any graphs where some subpath A->B->C is shorter than A->C. Instead, you want to be be puttingitem[1]
intodone
when you pop it off the queue. Since this means you will get some duplicated nodes in your queue, you'll also need to check if the newly popped node was already done before processing it. You might be able to save a little processing time by storing the cost of the path to the node in a dictionary, and only adding a new path if it's less than the previous one (but I don't think the improvement is very large).This is a readability improvement (not a performance or correctness fix), but I suggest unpacking
item
into several values, rather than indexing it.cost, node, first_step = queue.get()
, or something similar.A terminology suggestion: Rather than being a basic breadth-first search, this is Dijkstra's Algorithm. If all the edge weights are equal, you'll examine nodes (roughly) the same order as in a BFS, but if you have different weights you'll go in a different order. It might be a good idea to use a function name that reflect what you're actually doing in the code.
I understand the reason for you to implement your own search routine instead of using what is readily available is because you want to return the node that is the first step toward the target instead of the target itself. Before presenting my code, I would like to raise a couple of suggestions:
Use better descriptive names
Names like max, num are not meaningful. I suggest to use names such as max_visiting_nodes, visit_count instead.
Similarly, item[1], item[2], ... do not mean much. Instead of item = queue.get()
, how about something like weigh, target_node, first_step_node = queue.get()
Correctness.
If you have a graph like this:
graph1
The red colors are the weight. If search starts with node 1 and node 2 has the attribute you want, the first step toward the target would be node 5, since the path 1-5-4-3-2 is shorter (weight = 4), instead of 1-2 (weight = 10). Your algorithm returns 2, not 5.
Propose solution
I propose using what networkx offering, meaning bfs_tree()
to find a list of nodes, then shortest_path()
:
def bfs_find_first_step(graph, node, attribute, max_visiting_nodes = 10000):
for visit_count, visiting_node in enumerate(nx.bfs_tree(graph, node)):
if visit_count == max_visiting_nodes: break
if graph.node[visiting_node].get(attribute):
shortest_path = nx.shortest_path(graph, node, visiting_node, 'weight')
if len(shortest_path) == 1:
return node # Only current node qualifies
else:
return shortest_path[1] # First step toward the target
return None
A few notes:
- In my solution, I use the visit_count (AKA
num
) to limit the search time, the same way you would like to. - the
nx.shortest_path()
function takes in as the fourth parameter, the string name of the edge attribute which used for calculating the weight. I think this is what you have in mind. - If you start search from node 1, and only node 1 has the attribute your want, your function returns 5, whereas mine returns 1. I think mine is more correct, but I am not sure--it might not be what you want. This case is when
len(shortest_path) == 1
in my code.
-
\$\begingroup\$ Wow. A few things:
bfs_tree
is slow compared to my method for large networks and smallmax_visiting_nodes
(and maybe also my optimized version from the other answer for all networks). Also your version iterates at least twice (at least once inbfs_tree
) and that is unnecessary. When I get some time, I may go into netowrkx and make a version of A* for search. Alsobfs_search
is not what I am doing, highlighted my the other answer, sobfs_tree
is not correct here. Your answer is BFS and does not really useshortest_path
for deciding what node to return (it gets first instead). \$\endgroup\$fread2281– fread22812013年04月01日 04:49:24 +00:00Commented Apr 1, 2013 at 4:49 -
\$\begingroup\$ Also your solution will give 1 by 1->2 (it does not matter here, but for more complex networks it will). \$\endgroup\$fread2281– fread22812013年04月01日 04:51:40 +00:00Commented Apr 1, 2013 at 4:51
-
\$\begingroup\$ I just found out that
bfs_tree()
does not seem to return the correct sequence. I am doing some tests at this moment. \$\endgroup\$Hai Vu– Hai Vu2013年04月01日 16:45:45 +00:00Commented Apr 1, 2013 at 16:45 -
\$\begingroup\$ BFS is breadth-first-search, so it returns based on depth, not weight. en.wikipedia.org/wiki/Breadth-first_search \$\endgroup\$fread2281– fread22812013年04月01日 23:47:54 +00:00Commented Apr 1, 2013 at 23:47
heapq
directly when you don't need the thread synchronization thatQueue.PriorityQueue
provides. \$\endgroup\$