I have a graph which we can think of is representing a network of train tracks. Each node is a station and each edge is a piece of track connecting two stations. Each station node may be connected by more than one track and all tracks are one way (i.e the graph is directed).
My job is to find as many routes as possible connecting one node with a set of destination nodes. Routes may contain cycles to the train track analogy breaks down here, but whatever. I have Python code to perform this task, but it is to slow so I would love to get some help in making it faster. Below is my traversal function:
from collections import deque
from itertools import product
def travel(graph, start, dests, count):
queue = deque([[(start, None)]])
paths = []
while queue:
path = queue.popleft()
at = path[-1][0]
if at in dests:
pp = [p[1] for p in path[1:]]
for real_path in product(*pp):
paths.append(''.join(real_path))
count -= 1
if count == 0:
return paths
adj = graph.get(at, {})
for item in adj.items():
queue.append(path + [item])
return paths
Called with a sample graph:
G = {
1 : {2 : 'mqr', 3 : 't'},
2 : {3 : 'q'},
3 : {1 : 'S', 4 : '_'},
4 : {3 : 't', 1 : 'xUI', 5 : '=+/'},
5 : {4 : 'q', 6 : '1'},
6 : {1 : '0'}
}
print(len(set(travel(G, 1, [6], 1000))))
1000 unique strings representing paths through the graph will be generated. Ideas on how to improve the travel
function would be greatly appreciated. Also on the format I'm using for representing the graph. It is usually very efficient to store a graph as a dict whose values are themselves dicts, but perhaps there is a better format.
1 Answer 1
There's no docstring. What does
travel
do? What arguments does it take? What does it return?The meaning of the data structure
queue
is far from clear. It seems to be a deque whose elements represent paths, each path being represented by a list of pairs(v, e)
wherev
is the vertex on the path, ande
isNone
if this is the first vertex on the path, or an iterable of edges by whichv
could be reached from the previous vertex of the path. This kind of complex data structure is hard to understand, and would be clearer if it had its own class, for example usingcollections.namedtuple
.It would be a good idea to generate the paths one at a time using the
yield
statement, so that if a consumer is processing the paths one at a time then the whole list of paths does not have to be stored in memory.Generating the paths would also allow us to omit the
count
argument and keep generating paths forever. If the caller wants at most 1000 paths, they can useitertools.islice
:islice(travel(graph, start, dests), 1000)
The test
if at in dests:
takes time proportional to the number of destinations. This would run in constant time ifdests
were converted to a set.It would be reasonable to insist on the graph having a key for every vertex (whose corresponding value is an empty dictionary if the vertex is a sink); this would allow you to write
graph[at]
instead ofgraph.get(at, {})
which creates and throws away an empty dictionary.The implementation stores
path + [item]
in the queue, taking time and space proportional to the length of the path, on each iteration. This is unnecessary: many paths will share the same initial prefix, so a data structure which stored only the last item on the path, together with a reference to the previous data structure on the path, would take constant time on each iteration. The whole path could be reconstructed when the destination is reached by following the references.
2. Revised code
from collections import deque, namedtuple
# Node in the breadth-first search with fields:
# vertex -- the current vertex in the search
# edges -- iterable of edges from previous vertex (or None)
# prev -- previous Search node (or None)
SearchNode = namedtuple('SearchNode', 'vertex edges prev')
def travel(graph, start, dests):
"""Generate paths in graph beginning at the vertex start and ending at
any of the vertices in dests.
The graph must be represented as a mapping from vertex to a
mapping from neighbouring vertex to an iterable of edges.
"""
dests = set(dests)
queue = deque([SearchNode(start, None, None)])
while queue:
at = queue.popleft()
if at.vertex in dests:
edges = []
prev = at
while prev.edges is not None:
edges.append(prev.edges)
prev = prev.prev
for path in product(*reversed(edges)):
yield ''.join(path)
for v, e in graph[at.vertex].items():
queue.append(SearchNode(v, e, at))
-
\$\begingroup\$ Thanks for your suggestions. In this case, my main performance problem turned out that the graph had "looping train tracks going to nowhere." After pruning redundant nodes, my algorithm became much faster. \$\endgroup\$Gaslight Deceive Subvert– Gaslight Deceive Subvert2018年02月04日 18:19:06 +00:00Commented Feb 4, 2018 at 18:19
Explore related questions
See similar questions with these tags.
t_=1
,qq_=1
occur, but notqqqqt_=1
. Going round and round cycles is allowed and a clever solution could definitely exploit that. However, I can't say whether cycle exploitation would be easy to implement or efficient. In the example..tStStStSt_=1
is such a cycle. \$\endgroup\$tSt_=1
,tStSt_=1
,tStStSt_=1
, ...? I'm trying to get you to explain the constraints on the solution here: there's some choice about which routes to output, so in order to review your code, we need to know which outputs are acceptable and which are not. If asked to return 10 routes, will any 10 routes do? \$\endgroup\$