Find all paths in a graph

Question 1

My code does a DFS on the given graph and prints all possible paths for a given start and end node. My question is how can this be improved?

Can some form of memoization be used here? There are cases where a path from a certain source - destination is already found and will be an intermediate path for a different source - destination but it is still computed again.
As it is a generator will that make memoization different?
Is there an already existing text book algorithm that I am missing? Does not have to be specifically DFS.
Pythonic / Idiomatic code suggestions are also welcome.

This is my code

from itertools import product
def find_path(g, src, dst):
 """Prints all possible path for a graph `g` for all pairs in `src` and `dst`
 
 Args:
 g (list): 2d list, graph
 src (list): list of nodes that will be the source
 dst (list): list of nodes that will be the destination
 """
 graph = {}
 # constructing a graph
 for from_, to_ in g:
 graph.setdefault(from_, set()).add(to_)
 graph.setdefault(to_, set()).add(from_) # undirected
 def dfs(src, dst, path, seen):
 """Recursive generator that yields all paths from `src` to `dst`
 
 Args:
 src (int): source node
 dst (int): destination node
 path (list): path so far
 seen (set): set of visited nodes
 
 Yields:
 list: all paths from `src` to `dst`
 """
 if src in seen:
 return
 if src == dst:
 yield path + [src]
 return
 seen.add(src)
 for i in graph.get(src, []):
 yield from dfs(i, dst, path + [src], seen)
 seen.remove(src)
 # start finding path for all `src` `dst` pairs
 for source, destination in product(src, dst):
 print(source, destination, list(dfs(source, destination, [], set())))
g = list(product(range(1, 5), repeat=2)) # a graph for testing purpose
src = dst = range(1, 5) # source and destination, in actual code this will be a list
find_path(g, src, dst)

Question 2

Is there any additional details needed?

Question 3

I don't think memoization will be useful here. Any correct algorithm is going to be O(n!), nothing you do can reduce that fundamentally because you still have to output all the paths.
Trying to cache stuff you already output is just going to make the space-costs worse, it won't make the processing-time any better.
The advantages of using a generator are that the caller doesn't have to wait for the whole computation to finish before handling the early outputs, and that we never have to store the results of the whole computation at once. This is a good choice for the task at hand!
It doesn't affect the point in (1.) though.
You seem to have already found the textbook algorithm.
There are various things to make the code prettier, but you'd have to do some testing to see if they make it work better.
- Types are nice. They do nothing at runtime, but they let MyPy check for many kinds of mistakes.
- I don't like the redundancy of path and seen, and I don't like that you're mutating seen the way you are, but I suspect the way you've written it is actually the most efficient for larger graphs.
- The expression path + [src] makes two calls to the list constructor, and it's happening inside the for loop. That can be improved!
- Since you never mutate a value of path, it can be a tuple instead of a list.
- The way you have it set up now, your adjacencies dict will always contain src, so you don't need to use get.

I'm ignoring your outer wrapper, and I've made no effort to test this:

from typing import Iterable, Mapping, MutableSet, Sequence, TypeVar
N = TypeVar('N')
def dfs(source: N, # The current node
 destination: N, # The destination node
 path: Sequence[N], # The path taken to the current node
 seen: MutableSet[N], # The nodes visited on the path
 graph: Mapping[N, Iterable[N]] # The graph being explored, as an adjacency dict
) -> Iterable[Sequence[N]]:
 new_path = (*path, source)
 if source == destination:
 yield new_path
 else:
 seen.add(source)
 for n in graph[source]:
 if n not in seen:
 yield from dfs(n, destination, new_path, seen, graph)
 seen.remove(source)

Question 4

thank you for answering this, "don't like that you're mutating seen the way you are", can you suggest any other ways this can be done?

Question 5

The other options would be to make a new set for the recursive seen, like you you're doing with path. Mutating values is often efficient, but opens up its own class of bugs. But as I understand it's somewhat expensive to spin up a new set instance (relative to other basic structures), so for this purpose your way is probably better.

Question 6

I have one other question, assume a path exists like so '1 -> 2 -> 3 -> 4 -> 5 -> 6' and before that computation, I have already computed '2 -> 3 -> 4 -> 5', that was what I meant when I said how memoization can be used, as you can see, the second path is part of the result of the first path, can I not take advantage of this? I will clarify if you need more details

Question 7

also, I will give the bounty to you if I dont get any other even more well written answer, just want to clarify things before :)

Question 8

I understand what you mean by memoization, and I don't think it's going to help here. Suppose you've cached (memoized) the 2...5 path. Probably you've also cached all the other paths from 2. Great, but three problems: 1) You still have to iterate over all those 2... paths and output them; I don't think you can do that fundamentally faster than just rebuilding them all as-needed. 2) You have to store them all somewhere; that could be costly and/or slow. 3) You can't just iterate over all of them; any containing a 1 must be excluded, so there's still extra computation to do.

ShapeOfMatter ShapeOfMatter 4,4377 silver badges25 bronze badges · Accepted Answer · 2021-05-17 15:59:12Z

I don't think memoization will be useful here. Any correct algorithm is going to be O(n!), nothing you do can reduce that fundamentally because you still have to output all the paths.
Trying to cache stuff you already output is just going to make the space-costs worse, it won't make the processing-time any better.
The advantages of using a generator are that the caller doesn't have to wait for the whole computation to finish before handling the early outputs, and that we never have to store the results of the whole computation at once. This is a good choice for the task at hand!
It doesn't affect the point in (1.) though.
You seem to have already found the textbook algorithm.
There are various things to make the code prettier, but you'd have to do some testing to see if they make it work better.
- Types are nice. They do nothing at runtime, but they let MyPy check for many kinds of mistakes.
- I don't like the redundancy of path and seen, and I don't like that you're mutating seen the way you are, but I suspect the way you've written it is actually the most efficient for larger graphs.
- The expression path + [src] makes two calls to the list constructor, and it's happening inside the for loop. That can be improved!
- Since you never mutate a value of path, it can be a tuple instead of a list.
- The way you have it set up now, your adjacencies dict will always contain src, so you don't need to use get.

I'm ignoring your outer wrapper, and I've made no effort to test this:

from typing import Iterable, Mapping, MutableSet, Sequence, TypeVar
N = TypeVar('N')
def dfs(source: N, # The current node
 destination: N, # The destination node
 path: Sequence[N], # The path taken to the current node
 seen: MutableSet[N], # The nodes visited on the path
 graph: Mapping[N, Iterable[N]] # The graph being explored, as an adjacency dict
) -> Iterable[Sequence[N]]:
 new_path = (*path, source)
 if source == destination:
 yield new_path
 else:
 seen.add(source)
 for n in graph[source]:
 if n not in seen:
 yield from dfs(n, destination, new_path, seen, graph)
 seen.remove(source)

thank you for answering this, "don't like that you're mutating seen the way you are", can you suggest any other ways this can be done?
The other options would be to make a new set for the recursive seen, like you you're doing with path. Mutating values is often efficient, but opens up its own class of bugs. But as I understand it's somewhat expensive to spin up a new set instance (relative to other basic structures), so for this purpose your way is probably better.
I have one other question, assume a path exists like so '1 -> 2 -> 3 -> 4 -> 5 -> 6' and before that computation, I have already computed '2 -> 3 -> 4 -> 5', that was what I meant when I said how memoization can be used, as you can see, the second path is part of the result of the first path, can I not take advantage of this? I will clarify if you need more details
also, I will give the bounty to you if I dont get any other even more well written answer, just want to clarify things before :)
I understand what you mean by memoization, and I don't think it's going to help here. Suppose you've cached (memoized) the 2...5 path. Probably you've also cached all the other paths from 2. Great, but three problems: 1) You still have to iterate over all those 2... paths and output them; I don't think you can do that fundamentally faster than just rebuilding them all as-needed. 2) You have to store them all somewhere; that could be costly and/or slow. 3) You can't just iterate over all of them; any containing a 1 must be excluded, so there's still extra computation to do.

Stack Exchange Network

Find all paths in a graph

1 Answer 1

Your Answer

Sign up or log in

Post as a guest

Post as a guest

Hot Network Questions

Find all paths in a graph

1 Answer 1

Your Answer

Sign up or log in

Post as a guest

Post as a guest

Related

Hot Network Questions