Topological sort

Question 1

The algorithm works in steps:

Produce the graph in the form of an adjacency list

e.g. this graph
```
0 - - 2
 \
 1 -- 3
```
produces this adjacency list
```
{0: [], 1: [0], 2: [0], 3: [1, 2]}
```
0 depends on nothing, 1 depends on 0 etc..
Iterate through the graph and find nodes that does not have any dependencies

Questions:

What should the correct running time be for a well implemented topological sort. I am seeing different opinions:

Wikipedia says: \$O(log^2(n)\$)

Geeksforgeeks says: \$O(V+E)\$
My implementation is running at \$O(V*E)\$. Because at worst, I will need to loop through the graph V times and each time I will need to check E items. How do I make my implementation into linear time.

def produce_graph(prerequisites):
 adj = {}
 for course in prerequisites:
 if course[0] in adj:
 # append prequisites
 adj[course[0]].append(course[1])
 else:
 adj[course[0]] = [course[1]]
 # ensure that prerequisites are also in the graph
 if course[1] not in adj:
 adj[course[1]] = []
 return adj
def toposort(graph):
 sorted_courses = []
 while graph:
 # we mark this as False
 # In acyclic graph, we should be able to resolve at least
 # one node in each cycle
 acyclic = False
 for node, predecessors in graph.items():
 # here, we check whether this node has predecessors
 # if a node has no predecessors, it is already resolved,
 # we can jump straight to adding the node into sorted
 # else, mark resolved as False
 resolved = len(predecessors) == 0
 for predecessor in predecessors:
 # this node has predecessor that is not yet resolved
 if predecessor in graph:
 resolved = False
 break
 else:
 # this particular predecessor is resolved
 resolved = True
 # all the predecessor of this node has been resolved
 # therefore this node is also resolved
 if resolved:
 # since we are able to resolve this node
 # We mark this to be acyclic
 acyclic = True
 del graph[node]
 sorted_courses.append(node)
 # if we go through the graph, and found that we could not resolve
 # any node. Then that means this graph is cyclic
 if not acyclic:
 # if not acyclic then there is no order
 # return empty list
 return []
 return sorted_courses
graph = produce_graph([[1,0],[2,0],[3,1],[3,2]])
print toposort(graph)

Question 2

Is your indentation messed up in produce_graph? I think the second if test should be in the for loop.

Question 3

Please read the whole sentence from Wikipedia: "... it can be computed in O(log^2 n) time on a parallel computer using a polynomial number O(n^k) of processors, for some constant k."

Question 4

Please don't cross-post - it doesn't seem like you actually want a review in terms of the scope of this site.

Question 5

Documentation

Please add documentation to tell what the function is supposed to do and what input it is expecting.

Avoid repetition

In produce_graph, you repeat course[0] many times which smells like something wrong.

It can be rewritten :

def produce_graph(prerequisites):
 adj = {}
 for course in prerequisites:
 first, second = course[0], course[1]
 if first in adj:
 # append prequisites
 adj[first].append(second)
 else:
 adj[first] = [second]
 # ensure that prerequisites are also in the graph
 if second not in adj:
 adj[second] = []
 return adj

Using the right tool

What you are doing in produce_graph can be easily achieved using the set_default function :

adj = {}
for course in prerequisites:
 first, second = course[0], course[1]
 adj.setdefault(first, []).append(second)

(One might also suggest using defaultdict, I'll let you pick your favorite).

Using the right tool is also choosing the right data structure. It would probably make sense for produce_graph to take a list of tuples. Also, if you know that each elements contains 2 elements, you can use tuple unpacking like this :

for course in prerequisites:
 first, second = course

or

for first, second in prerequisites:
 adj.setdefault(first, []).append(second)

Avoid premature optimisation/keep it simple

Regarding :

 resolved = len(predecessors) == 0
 for predecessor in predecessors:
 # this node has predecessor that is not yet resolved
 if predecessor in graph:
 resolved = False
 break
 else:
 # this particular predecessor is resolved
 resolved = True

At the beginning, you iterate resolved to True if predecessors is True, False otherwise. I think this does the same as :

 resolved = True
 for predecessor in predecessors:
 # this node has predecessor that is not yet resolved
 if predecessor in graph:
 resolved = False
 break

As resolved will be False if and only if an element verifies the property in graph.

Also, this can be rewritten using builtin all/any :

 if all(p not in graph for p in predecessors):

SylvainD SylvainD 29.7k1 gold badge49 silver badges93 bronze badges · Answer 1 · 2015-06-25 11:58:35Z

Documentation

Please add documentation to tell what the function is supposed to do and what input it is expecting.

Avoid repetition

In produce_graph, you repeat course[0] many times which smells like something wrong.

It can be rewritten :

def produce_graph(prerequisites):
 adj = {}
 for course in prerequisites:
 first, second = course[0], course[1]
 if first in adj:
 # append prequisites
 adj[first].append(second)
 else:
 adj[first] = [second]
 # ensure that prerequisites are also in the graph
 if second not in adj:
 adj[second] = []
 return adj

Using the right tool

What you are doing in produce_graph can be easily achieved using the set_default function :

adj = {}
for course in prerequisites:
 first, second = course[0], course[1]
 adj.setdefault(first, []).append(second)

(One might also suggest using defaultdict, I'll let you pick your favorite).

Using the right tool is also choosing the right data structure. It would probably make sense for produce_graph to take a list of tuples. Also, if you know that each elements contains 2 elements, you can use tuple unpacking like this :

for course in prerequisites:
 first, second = course

or

for first, second in prerequisites:
 adj.setdefault(first, []).append(second)

Avoid premature optimisation/keep it simple

Regarding :

 resolved = len(predecessors) == 0
 for predecessor in predecessors:
 # this node has predecessor that is not yet resolved
 if predecessor in graph:
 resolved = False
 break
 else:
 # this particular predecessor is resolved
 resolved = True

At the beginning, you iterate resolved to True if predecessors is True, False otherwise. I think this does the same as :

 resolved = True
 for predecessor in predecessors:
 # this node has predecessor that is not yet resolved
 if predecessor in graph:
 resolved = False
 break

As resolved will be False if and only if an element verifies the property in graph.

Also, this can be rewritten using builtin all/any :

 if all(p not in graph for p in predecessors):

Stack Exchange Network

Topological sort

1 Answer 1

Your Answer

Sign up or log in

Post as a guest

Post as a guest

Hot Network Questions

Topological sort

1 Answer 1

Your Answer

Sign up or log in

Post as a guest

Post as a guest

Related

Hot Network Questions