Below is my attempt at the challenge found here. The challenge is "Find a build order given a list of projects and dependencies." Already given are the classes Node and Graph (which I can add if anyone needs them to help review my code), and also
class Dependency(object):
def __init__(self, node_key_before, node_key_after):
self.node_key_before = node_key_before
self.node_key_after = node_key_after
I came up with the below:
class BuildOrder(object):
def __init__(self, dependencies):
self.dependencies = dependencies
self.graph = Graph()
self._build_graph()
def _build_graph(self):
for dependency in self.dependencies:
self.graph.add_edge(dependency.node_key_before,
dependency.node_key_after)
def find_build_order(self):
processed_nodes = [n for n in self.graph.nodes.values() if n.incoming_edges == 0]
if not processed_nodes:
return None
num_processed = len(processed_nodes)
num_nodes = len(self.graph.nodes)
while num_processed < num_nodes:
for node in [n for n in processed_nodes if n.visit_state == State.unvisited]:
node.visit_state = State.visited
for neighbor in list(node.adj_nodes.values()):
node.remove_neighbor(neighbor)
processed_nodes += [n for n in self.graph.nodes.values() if n.incoming_edges == 0 and
n.visit_state == State.unvisited]
if len(processed_nodes) == num_processed:
return None
else:
num_processed = len(processed_nodes)
return processed_nodes
When I timed it against the solution given here, mine was ~4.5x faster. This was surprising, as throughout completing the challenges in the repo, I usually have approx same runtime as the given solution, or worse.
I'd like to know if the above is pythonic, is it readable and how would you improve it? Also, I have deliberately left out comments - which lines in this should probably have a comment to help others understand it? Thanks!
2 Answers 2
"Pythonicness"
The code is easy to read and understandable - a good start.
- Remove
(object)
when declaring classes, it is not needed :) - Use type annotations!
from typing import List
def __init__(self, node_key_before: Node, node_key_after: Node):
def __init__(self, dependencies: List[Dependency]):
def find_build_order(self) -> List[Node]:
- The statement:
if len(processed_nodes) == num_processed:
return None
else:
num_processed = len(processed_nodes)
Can be rewritten as:
if len(processed_nodes) == num_processed:
return None
num_processed = len(processed_nodes)
That is it for this section (for now - I might edit later).
Algorithm and runtime
Let's talk complexity. This algorithm runs in O(V * V + E)
, it can be seen from the example of the following graph:
a -> b -> c -> d -> ... (V times)
You will iterate over the V
nodes and for each of them you will search (twice in fact) over all the V
nodes. The reason we add + E
is since you will also delete all edges of the graph.
Now the correct algorithm can do this in O(V + E)
, the analysis a just a little bit more complex but you can do it yourself.
Why is my algorithm faster in tests than?
I cannot answer for sure but 2 explanations are possible:
The graph in the test has many many edges and
E
is close toV * V
, therefore the fact that the time complexity is worst is not relevant in practice.After a quick look at the code in the answer you can see it has more functions and more dictionaries, this is probably the reason your code is faster in with smaller graphs, also cache probably plays a factor here.
I might add some more stuff. This is it for now.
-
\$\begingroup\$ Am I right in thinking that I can remove the 'else' since the 'if' results in a return statement? I have seen a lot of code that just does 'if' several times, without returns in the bodies of the 'if's. I guess I am asking, is 'else' (and even 'elif') actually necessary ever? \$\endgroup\$msm1089– msm10892021年07月17日 17:49:59 +00:00Commented Jul 17, 2021 at 17:49
-
\$\begingroup\$ Would you be able to give an example of how to do the type annotations please? \$\endgroup\$msm1089– msm10892021年07月17日 17:56:48 +00:00Commented Jul 17, 2021 at 17:56
-
\$\begingroup\$ Yes you are right, this is also point 3 in the first section. I will edit to add a type annotation, no problem. \$\endgroup\$Yonlif– Yonlif2021年07月17日 18:49:31 +00:00Commented Jul 17, 2021 at 18:49
I decided to implement it following Kahn's algorithm found here, which is 0(V + E). Below is my direct translation into python, except for the final if statement, where I just compare length of L and the number of nodes in the graph rather than iterating thru the nodes and counting incoming edges.
def find_build_order(self):
L = []
S = {n for n in self.graph.nodes.values() if n.incoming_edges == 0}
while S:
n = S.pop()
L.append(n)
for m in list(n.adj_nodes.values()):
n.remove_neighbor(m)
if m.incoming_edges == 0:
S.add(m)
if len(L) != len(self.graph.nodes):
return None
return L
This works, but again I am puzzled on the runtime. The times are laid out below.
Kahn's algo: 3.69 μs ± 42 ns per loop
Given solution: 4.36 μs ± 78.5 ns per loop
Algo in my OP (with the minor changes suggested): 986 ns ± 11.2 ns per loop
Kahn's algo seems a lot more efficient than my first attempt, and it is compact unlike the given solution. Maybe I have not implemented it correctly.