Finding many paths in a graph

Question 1

I have a graph which we can think of is representing a network of train tracks. Each node is a station and each edge is a piece of track connecting two stations. Each station node may be connected by more than one track and all tracks are one way (i.e the graph is directed).

My job is to find as many routes as possible connecting one node with a set of destination nodes. Routes may contain cycles to the train track analogy breaks down here, but whatever. I have Python code to perform this task, but it is to slow so I would love to get some help in making it faster. Below is my traversal function:

from collections import deque
from itertools import product
def travel(graph, start, dests, count):
 queue = deque([[(start, None)]])
 paths = []
 while queue:
 path = queue.popleft()
 at = path[-1][0]
 if at in dests:
 pp = [p[1] for p in path[1:]]
 for real_path in product(*pp):
 paths.append(''.join(real_path))
 count -= 1
 if count == 0:
 return paths
 adj = graph.get(at, {})
 for item in adj.items():
 queue.append(path + [item])
 return paths

Called with a sample graph:

G = {
 1 : {2 : 'mqr', 3 : 't'},
 2 : {3 : 'q'},
 3 : {1 : 'S', 4 : '_'},
 4 : {3 : 't', 1 : 'xUI', 5 : '=+/'},
 5 : {4 : 'q', 6 : '1'},
 6 : {1 : '0'}
}
print(len(set(travel(G, 1, [6], 1000))))

1000 unique strings representing paths through the graph will be generated. Ideas on how to improve the travel function would be greatly appreciated. Also on the format I'm using for representing the graph. It is usually very efficient to store a graph as a dict whose values are themselves dicts, but perhaps there is a better format.

Question 2

t_=1, qq_=1 occur, but not qqqqt_=1. Going round and round cycles is allowed and a clever solution could definitely exploit that. However, I can't say whether cycle exploitation would be easy to implement or efficient. In the example ..tStStStSt_=1 is such a cycle.

Question 3

Ok, what about some other cycle then: would you be happy with tSt_=1, tStSt_=1, tStStSt_=1, ...? I'm trying to get you to explain the constraints on the solution here: there's some choice about which routes to output, so in order to review your code, we need to know which outputs are acceptable and which are not. If asked to return 10 routes, will any 10 routes do?

Question 4

Yes, it is allowed. Anything is allowed as long as it keeps the code's behavior. But please do not make it specific to the sample graph I gave.

Question 5

There's no docstring. What does travel do? What arguments does it take? What does it return?
The meaning of the data structure queue is far from clear. It seems to be a deque whose elements represent paths, each path being represented by a list of pairs (v, e) where v is the vertex on the path, and e is None if this is the first vertex on the path, or an iterable of edges by which v could be reached from the previous vertex of the path. This kind of complex data structure is hard to understand, and would be clearer if it had its own class, for example using collections.namedtuple.
It would be a good idea to generate the paths one at a time using the yield statement, so that if a consumer is processing the paths one at a time then the whole list of paths does not have to be stored in memory.
Generating the paths would also allow us to omit the count argument and keep generating paths forever. If the caller wants at most 1000 paths, they can use itertools.islice:
```
islice(travel(graph, start, dests), 1000)
```
The test if at in dests: takes time proportional to the number of destinations. This would run in constant time if dests were converted to a set.
It would be reasonable to insist on the graph having a key for every vertex (whose corresponding value is an empty dictionary if the vertex is a sink); this would allow you to write graph[at] instead of graph.get(at, {}) which creates and throws away an empty dictionary.
The implementation stores path + [item] in the queue, taking time and space proportional to the length of the path, on each iteration. This is unnecessary: many paths will share the same initial prefix, so a data structure which stored only the last item on the path, together with a reference to the previous data structure on the path, would take constant time on each iteration. The whole path could be reconstructed when the destination is reached by following the references.

2. Revised code

from collections import deque, namedtuple
# Node in the breadth-first search with fields:
# vertex -- the current vertex in the search
# edges -- iterable of edges from previous vertex (or None)
# prev -- previous Search node (or None)
SearchNode = namedtuple('SearchNode', 'vertex edges prev')
def travel(graph, start, dests):
 """Generate paths in graph beginning at the vertex start and ending at
 any of the vertices in dests.
 The graph must be represented as a mapping from vertex to a
 mapping from neighbouring vertex to an iterable of edges.
 """
 dests = set(dests)
 queue = deque([SearchNode(start, None, None)])
 while queue:
 at = queue.popleft()
 if at.vertex in dests:
 edges = []
 prev = at
 while prev.edges is not None:
 edges.append(prev.edges)
 prev = prev.prev
 for path in product(*reversed(edges)):
 yield ''.join(path)
 for v, e in graph[at.vertex].items():
 queue.append(SearchNode(v, e, at))

Question 6

Thanks for your suggestions. In this case, my main performance problem turned out that the graph had "looping train tracks going to nowhere." After pruning redundant nodes, my algorithm became much faster.

Gareth Rees Gareth Rees 50.1k3 gold badges130 silver badges210 bronze badges · Accepted Answer · 2018-02-03 10:23:30Z

There's no docstring. What does travel do? What arguments does it take? What does it return?
The meaning of the data structure queue is far from clear. It seems to be a deque whose elements represent paths, each path being represented by a list of pairs (v, e) where v is the vertex on the path, and e is None if this is the first vertex on the path, or an iterable of edges by which v could be reached from the previous vertex of the path. This kind of complex data structure is hard to understand, and would be clearer if it had its own class, for example using collections.namedtuple.
It would be a good idea to generate the paths one at a time using the yield statement, so that if a consumer is processing the paths one at a time then the whole list of paths does not have to be stored in memory.
Generating the paths would also allow us to omit the count argument and keep generating paths forever. If the caller wants at most 1000 paths, they can use itertools.islice:
```
islice(travel(graph, start, dests), 1000)
```
The test if at in dests: takes time proportional to the number of destinations. This would run in constant time if dests were converted to a set.
It would be reasonable to insist on the graph having a key for every vertex (whose corresponding value is an empty dictionary if the vertex is a sink); this would allow you to write graph[at] instead of graph.get(at, {}) which creates and throws away an empty dictionary.
The implementation stores path + [item] in the queue, taking time and space proportional to the length of the path, on each iteration. This is unnecessary: many paths will share the same initial prefix, so a data structure which stored only the last item on the path, together with a reference to the previous data structure on the path, would take constant time on each iteration. The whole path could be reconstructed when the destination is reached by following the references.

2. Revised code

from collections import deque, namedtuple
# Node in the breadth-first search with fields:
# vertex -- the current vertex in the search
# edges -- iterable of edges from previous vertex (or None)
# prev -- previous Search node (or None)
SearchNode = namedtuple('SearchNode', 'vertex edges prev')
def travel(graph, start, dests):
 """Generate paths in graph beginning at the vertex start and ending at
 any of the vertices in dests.
 The graph must be represented as a mapping from vertex to a
 mapping from neighbouring vertex to an iterable of edges.
 """
 dests = set(dests)
 queue = deque([SearchNode(start, None, None)])
 while queue:
 at = queue.popleft()
 if at.vertex in dests:
 edges = []
 prev = at
 while prev.edges is not None:
 edges.append(prev.edges)
 prev = prev.prev
 for path in product(*reversed(edges)):
 yield ''.join(path)
 for v, e in graph[at.vertex].items():
 queue.append(SearchNode(v, e, at))

Thanks for your suggestions. In this case, my main performance problem turned out that the graph had "looping train tracks going to nowhere." After pruning redundant nodes, my algorithm became much faster.

Stack Exchange Network

Finding many paths in a graph

1 Answer 1

2. Revised code

Your Answer

Sign up or log in

Post as a guest

Post as a guest

Hot Network Questions

Finding many paths in a graph

1 Answer 1

2. Revised code

Your Answer

Sign up or log in

Post as a guest

Post as a guest

Related

Hot Network Questions