Can someone help me optimize my DFS implementation in Python?

Question 1

def dfs(graph, node):
"""Run DFS through the graph from the starting 
node and return the nodes in order of finishing time.
"""
stack = [[node, True]]
while True in [x[1] for x in stack]:
 i = 0
 for x in xrange(len(stack)):
 if stack[x][1] == True:
 i = x
 break
 stack[i][1] = False
 stack = [[x, True] for x in graph[stack[i][0]] \
 if (([x, True] not in stack) and ([x, False] not in stack))] + stack
return [x[0] for x in stack]

Input: Graph as adjacency list Eg: {1:[2, 3], 2:[1], 3:[]}
Node is an integer value from where depth first search begins Eg: 1
Output: The list of nodes in order of their finishing times.
Eg: [3, 2, 1]

For large graphs, the above code is taking too long to execute. Is there a better way to do this?
Edit: I could do better time by implementing the following code. Is it possible to do even better?

def dfs(graph, node):
"""Run DFS through the graph from the starting 
node and return the nodes in order of finishing time.
"""
seen = defaultdict(bool)
stack = [node]
while True:
 flag = True
 for x in stack:
 if not seen[x]:
 stack = [a for a in graph[x] if not seen[a]] + stack
 seen[x] = True
 flag = False
 break
 if flag:
 break
return stack

Edit: I've changed seen and stack variables to sets. But I'm not getting the required performance boost. What am I doing wrong?

def dfs(graph, node):
 """Run DFS through the graph from the starting 
 node and return the nodes in order of finishing time.
 """
 seen = set()
 stack = {node}
 while True:
 flag = True
 for x in stack:
 if x not in seen:
 stack = stack.union(set(graph[x]))
 seen = seen.union({x})
 flag = False
 break
 if flag:
 break
 return list(stack)

Question 2

This does not do depth-first search, it does breadth-first search. Which do you want? I'll write an answer later today when I have time. Btw: The third function doesn't work because sets aren't ordered (unless all you want to do is to find all nodes that are reachable from the starting node).

Question 3

Winston has already answered it better than I could, so nevermind.

Question 4

Some hints.

Set type can be used for seen variable.

That stack = [...] + stack is a monster, both + operation and list comprehension.

Is it possible to use set there (what type graph[x] is?). Maybe, its more efficient to have stack in a list of lists form?

UPDATE: loop over the whole stack again and again can be another point of optimization.

UPDATE2: in case you have not solved the problem, one possible way is to eliminate the inefficient filtering of the stack against seen values. This can be achieved by having a set of not yet visited nodes and using set operations to get the next level from the current level. This is why I've advised to keep levels in the stack (list of sets - as particular order inside a level is not important, is it?). Operating on relatively small number of nodes in the latest level is more efficient than looping over the whole stack. Actually, already seen nodes can also be subtracted from "next level", so if the speed is ok, set of unvisited nodes can be redundant.

UPDATE3:

I think, the following produces result, which is stated in the docstring. Finishing time is a time when the node has no more children to process and is popped from the stack. Reverse order can also be easily achieved by changing .append(x) to insert(0, x).

def dfs(graph, node):
 """Run DFS through the graph from the starting
 node and return the nodes in order of finishing time.
 """
 seen = {node}
 stack = [node]
 result = []
 while stack:
 x = stack[-1]
 nxt = set(graph.get(x, [])) - seen
 if nxt:
 stack.extend(list(nxt))
 seen |= nxt
 else:
 result.append(x)
 stack.pop()
 return result

Remark: the algorithm does not preserve the order of children. Minor modification may be needed to preserve the order of graph.get(x, []), in if nxt:

stack.extend(nx for nx in graph[x] if nx in nxt)

Question 5

I tried using sets, I've edited the question description to include the edited code. The running time is not very different from my second attempt. Can you tell me where I'm going wrong?

Question 6

@Roman The type of graph[x] is a list: (quoting vaishaks) "Input: Graph as adjacency list Eg: {1:[2, 3], 2:[1], 3:[]}"

Question 7

Those can be easily converted to sets, of course. (if the order of traversal is not important)

Question 8

@Roman Update3 is achieving exactly what I needed and it's running in good time. Thanks a lot Roman!

Question 9

def dfs(graph, node):

This isn't a depth first search

 """Run DFS through the graph from the starting 
 node and return the nodes in order of finishing time.
 """
 seen = set()
 stack = {node}

You aren't using this as a stack, don't call it that

 while True:
 flag = True

avoid flag variables, they confound code. Also that's a very undescriptive name.

 for x in stack:
 if x not in seen:

Seen these are sets, you should be probably use: for x in stack - seen: which should be more efficient

 stack = stack.union(set(graph[x]))

Using this approach will make things rather slow, you just keep making stack bigger and bigger.

 seen = seen.union({x})

Use seen.add to add things to the set. Using union like this will be much slower

 flag = False
 break
 if flag:
 break
 return list(stack)

The set gives you no guarantees as to the order of the elements. I'm guessing that's now what you wanted. As it stands,this will end up giving you all the elements in your graph in a random order.

My approach

def breadth_first_search(graph, start):
 seen = set()
 current = {start}
 history = []
 while current:
 history.extend(current) # record when we found all these elements
 seen.update(current) # add them to the list of elements we've already seen
 neighbours = set() # generate a set of all elements that we can reach from these ones
 for node in current:
 neighbours.update(graph[node])
 current = neighbours - seen # the next round of elements is this one, except those we've already seen
 return history

Question 10

That is a brilliant answer Winston. I learnt a lot from that. But my application requires me to return the nodes in order of finishing times the way dfs would finish. I agree my iterative code looks a lot like bfs. And your code works perfectly well if I needed the nodes to be returned in order of finishing times the way bfs would. But the code I wrote gives me the right answer for small graphs. It's not at all random(because of using sets) as one would expect it to be. I find that very weird now that I know sets are un-ordered. Can you please tell me how you would do it for dfs?

Question 11

@vaishaks, your code working for small graphs is greatly mysterious. Any chance you can get me the data you ran it against?

Question 12

Sorry the output of the set code makes no sense actually. My bad. I've now reverted to function 2. The one with the defaultdict. Graph: {1: [3], 2: [1], 3: [2], 4: [3, 5, 6], 6: [8], 7: [6], 8: [7]} Start: 8, Your approach Output: [8, 7, 6] Required output: [6, 7, 8]. Start: 4, Your approach Output: [4, 3, 5, 6, 8, 2, 1, 7], Required output: [7, 8, 1, 2, 3, 5, 6, 4]

Question 13

@vaishaks, ok. why do you need this in dfs order?

Question 14

@vaishaks: You're right that it's not bfs (I was mistaken earlier), but it's not dfs either. (This may or may not be what you want.) E.g., in your last example, dfs would start at 4, go to one of the successors of 4, (let's say 3). Then dfs would take one of the children of 3, and so on. The other successor of 4 would only be traversed later. Also, is it important that the output be in "reversed" order (starting point at the end)?

Roman Susi Roman Susi 9738 silver badges11 bronze badges · Accepted Answer · 2013-08-05 19:30:45Z

Some hints.

Set type can be used for seen variable.

That stack = [...] + stack is a monster, both + operation and list comprehension.

Is it possible to use set there (what type graph[x] is?). Maybe, its more efficient to have stack in a list of lists form?

UPDATE: loop over the whole stack again and again can be another point of optimization.

UPDATE2: in case you have not solved the problem, one possible way is to eliminate the inefficient filtering of the stack against seen values. This can be achieved by having a set of not yet visited nodes and using set operations to get the next level from the current level. This is why I've advised to keep levels in the stack (list of sets - as particular order inside a level is not important, is it?). Operating on relatively small number of nodes in the latest level is more efficient than looping over the whole stack. Actually, already seen nodes can also be subtracted from "next level", so if the speed is ok, set of unvisited nodes can be redundant.

UPDATE3:

I think, the following produces result, which is stated in the docstring. Finishing time is a time when the node has no more children to process and is popped from the stack. Reverse order can also be easily achieved by changing .append(x) to insert(0, x).

def dfs(graph, node):
 """Run DFS through the graph from the starting
 node and return the nodes in order of finishing time.
 """
 seen = {node}
 stack = [node]
 result = []
 while stack:
 x = stack[-1]
 nxt = set(graph.get(x, [])) - seen
 if nxt:
 stack.extend(list(nxt))
 seen |= nxt
 else:
 result.append(x)
 stack.pop()
 return result

Remark: the algorithm does not preserve the order of children. Minor modification may be needed to preserve the order of graph.get(x, []), in if nxt:

stack.extend(nx for nx in graph[x] if nx in nxt)

I tried using sets, I've edited the question description to include the edited code. The running time is not very different from my second attempt. Can you tell me where I'm going wrong?
@Roman The type of graph[x] is a list: (quoting vaishaks) "Input: Graph as adjacency list Eg: {1:[2, 3], 2:[1], 3:[]}"
Those can be easily converted to sets, of course. (if the order of traversal is not important)
@Roman Update3 is achieving exactly what I needed and it's running in good time. Thanks a lot Roman!

Stack Exchange Network

Can someone help me optimize my DFS implementation in Python?

2 Answers 2

Your Answer

Sign up or log in

Post as a guest

Post as a guest

Hot Network Questions

Can someone help me optimize my DFS implementation in Python?

2 Answers 2

Your Answer

Sign up or log in

Post as a guest

Post as a guest

Related

Hot Network Questions