\$\begingroup\$
\$\endgroup\$
1
My code does a DFS on the given graph and prints all possible paths for a given start and end node. My question is how can this be improved?
- Can some form of memoization be used here? There are cases where a path from a certain source - destination is already found and will be an intermediate path for a different source - destination but it is still computed again.
- As it is a generator will that make memoization different?
- Is there an already existing text book algorithm that I am missing? Does not have to be specifically DFS.
- Pythonic / Idiomatic code suggestions are also welcome.
This is my code
from itertools import product
def find_path(g, src, dst):
"""Prints all possible path for a graph `g` for all pairs in `src` and `dst`
Args:
g (list): 2d list, graph
src (list): list of nodes that will be the source
dst (list): list of nodes that will be the destination
"""
graph = {}
# constructing a graph
for from_, to_ in g:
graph.setdefault(from_, set()).add(to_)
graph.setdefault(to_, set()).add(from_) # undirected
def dfs(src, dst, path, seen):
"""Recursive generator that yields all paths from `src` to `dst`
Args:
src (int): source node
dst (int): destination node
path (list): path so far
seen (set): set of visited nodes
Yields:
list: all paths from `src` to `dst`
"""
if src in seen:
return
if src == dst:
yield path + [src]
return
seen.add(src)
for i in graph.get(src, []):
yield from dfs(i, dst, path + [src], seen)
seen.remove(src)
# start finding path for all `src` `dst` pairs
for source, destination in product(src, dst):
print(source, destination, list(dfs(source, destination, [], set())))
g = list(product(range(1, 5), repeat=2)) # a graph for testing purpose
src = dst = range(1, 5) # source and destination, in actual code this will be a list
find_path(g, src, dst)
python_userpython_user
-
\$\begingroup\$ Is there any additional details needed? \$\endgroup\$python_user– python_user2021年05月15日 10:46:37 +00:00Commented May 15, 2021 at 10:46
1 Answer 1
\$\begingroup\$
\$\endgroup\$
5
- I don't think memoization will be useful here. Any correct algorithm is going to be O(n!), nothing you do can reduce that fundamentally because you still have to output all the paths.
Trying to cache stuff you already output is just going to make the space-costs worse, it won't make the processing-time any better. - The advantages of using a generator are that the caller doesn't have to wait for the whole computation to finish before handling the early outputs, and that we never have to store the results of the whole computation at once. This is a good choice for the task at hand!
It doesn't affect the point in (1.) though. - You seem to have already found the textbook algorithm.
- There are various things to make the code prettier, but you'd have to do some testing to see if they make it work better.
- Types are nice. They do nothing at runtime, but they let MyPy check for many kinds of mistakes.
- I don't like the redundancy of
path
andseen
, and I don't like that you're mutatingseen
the way you are, but I suspect the way you've written it is actually the most efficient for larger graphs. - The expression
path + [src]
makes two calls to thelist
constructor, and it's happening inside thefor
loop. That can be improved! - Since you never mutate a value of
path
, it can be a tuple instead of a list. - The way you have it set up now, your adjacencies dict will always contain
src
, so you don't need to useget
.
I'm ignoring your outer wrapper, and I've made no effort to test this:
from typing import Iterable, Mapping, MutableSet, Sequence, TypeVar
N = TypeVar('N')
def dfs(source: N, # The current node
destination: N, # The destination node
path: Sequence[N], # The path taken to the current node
seen: MutableSet[N], # The nodes visited on the path
graph: Mapping[N, Iterable[N]] # The graph being explored, as an adjacency dict
) -> Iterable[Sequence[N]]:
new_path = (*path, source)
if source == destination:
yield new_path
else:
seen.add(source)
for n in graph[source]:
if n not in seen:
yield from dfs(n, destination, new_path, seen, graph)
seen.remove(source)
answered May 17, 2021 at 15:59
-
\$\begingroup\$ thank you for answering this, "don't like that you're mutating seen the way you are", can you suggest any other ways this can be done? \$\endgroup\$python_user– python_user2021年05月18日 00:33:41 +00:00Commented May 18, 2021 at 0:33
-
\$\begingroup\$ The other options would be to make a new set for the recursive
seen
, like you you're doing withpath
. Mutating values is often efficient, but opens up its own class of bugs. But as I understand it's somewhat expensive to spin up a newset
instance (relative to other basic structures), so for this purpose your way is probably better. \$\endgroup\$ShapeOfMatter– ShapeOfMatter2021年05月18日 20:12:30 +00:00Commented May 18, 2021 at 20:12 -
\$\begingroup\$ I have one other question, assume a path exists like so
'1 -> 2 -> 3 -> 4 -> 5 -> 6'
and before that computation, I have already computed'2 -> 3 -> 4 -> 5'
, that was what I meant when I said how memoization can be used, as you can see, the second path is part of the result of the first path, can I not take advantage of this? I will clarify if you need more details \$\endgroup\$python_user– python_user2021年05月19日 01:01:29 +00:00Commented May 19, 2021 at 1:01 -
\$\begingroup\$ also, I will give the bounty to you if I dont get any other even more well written answer, just want to clarify things before :) \$\endgroup\$python_user– python_user2021年05月19日 01:01:47 +00:00Commented May 19, 2021 at 1:01
-
\$\begingroup\$ I understand what you mean by memoization, and I don't think it's going to help here. Suppose you've cached (memoized) the
2...5
path. Probably you've also cached all the other paths from 2. Great, but three problems: 1) You still have to iterate over all those2...
paths and output them; I don't think you can do that fundamentally faster than just rebuilding them all as-needed. 2) You have to store them all somewhere; that could be costly and/or slow. 3) You can't just iterate over all of them; any containing a1
must be excluded, so there's still extra computation to do. \$\endgroup\$ShapeOfMatter– ShapeOfMatter2021年05月19日 02:13:56 +00:00Commented May 19, 2021 at 2:13
Explore related questions
See similar questions with these tags.
lang-py