I am practicing my coding at HackerRank. The exact problem statement can be found here, but to summarize, it is finding the distances of the shortest path from a starting node to every other node in a non-directional graph where every edge is weighted 6. If a node is unreachable, the distance is -1. The nodes are named 1 to n. If you want an example, please see the provided one in the link.
To solve this, I googled an explanation of Dijkstra's Algorithm and tried my best to implement it (I am new to graph problems). My program passes 6/7 of the test cases, but it times out on the 7th. I improved some small inefficiencies, but it wasn't enough. Please advise me on how I may make the following faster:
def shortest (start, nodes, adj):
unique = [] #this part is pretty standard for this Algorithm I think
distance = [float('inf')]*nodes
for i in range(1, nodes+1):
if (len(adj[i-1])>0):
unique.append(i)
distance [start-1] = 0
while (len(unique)>0):
minDist = float('inf')
for i in unique:
if (distance[i-1] <= minDist):
minDist = distance[i-1]
distIndex = i
current = distIndex #sets the current node as the closest unvisited node
unique.remove(current)
for i in adj[current-1]: #updates the distance of each neighbour of the current node
neigh = i
temp = 6 + distance[current-1] #edges are 6
if temp < distance[neigh-1]:
distance[neigh-1] = temp
return distance
q = int(input()) #each test case contains multiple problems, so just consider one iteration
for i in range(q):
a = [] #for input purposes
b = [] #for output purposes
node = 0
edge = 0
#start of input
node, edge = list(map(int, input().strip().split())) #the input begins with the number of nodes and edges in the graph
for j in range(edge):
a.append(list(map(int, input().strip().split()))) #reading the edges, which are all inputted next
starting = int(input()) #reading the starting node
#end of input
adj = [[] for x in range(node)] #the adjacency matrix
for j in a: #making the adjacency matrix from the inputted edges
adj[j[0]-1].append(j[1])
adj[j[1]-1].append(j[0])
for j in adj:
j = list(set(j)) #someone warned that the problematic test case gives the same edge many times, so this is to remove duplicates
b = shortest(starting, node, adj)
#formatting the output to what the question required
b.remove(0)
for i in range(len(b)):
if b[i] == float('inf'):
b[i] = -1
print (" ".join(str(e) for e in b))
If it helps, I 'purchased' the test case which is timing out.
2 Answers 2
Those comments are very difficult to read.
j = list(set(j)) #someone warned that the problematic test case gives the same edge many times, so this is to remove duplicates
Format this properly:
# The problematic test case gives the same edge many times,
# so this is to remove duplicates
j = list(set(j))
Then remove fluff:
# Remove duplicates. Helps some test cases.
j = list(set(j))
In this case just remove the first comment, because list(set(...))
is already known to mean deduplicate. And remove the second since it's implied by the fact you've written the code.
j = list(set(j))
Another pair is
a = [] # for input purposes
b = [] # for output purposes
Don't do this. Just call them
descriptive_name_1 = []
descriptive_name_2 = []
so you don't need the comment in the first place.
Then there's q = ...
. So I wrack my brain thinking about what word contains q
. But it turns out you just meant to write num_test_cases
. What does that have to do with the letter q
?
Worse than that is when you outright lie. What would you expect the variable node
to contain? A node, right? Nope - it contains the total number of nodes.
And you write
node = 0
so we expect the total number of nodes to be 0
at that point, right? Nope, you immediately change your mind and write
node, edge = list(map(int, input().strip().split()))
But OK, fine. At least we know what nodes
contains, then: multiple node
s, so a list (or other collection) of lengths of nodes.
Nope, it actually means exactly the same thing as node
.
A similar comment goes for edge
, which should be num_edges
.
Then there's adj
, which you label "adjacency matrix". Why not just call it adjacency_matrix
then? But it's not even a dense matrix like that would imply, it's actually a list of per-node adjacencies, so call it adjacencies
.
Then there's starting
and start
, which are the same thing named differently. Call them start_node
or at least be consistent.
Going back to b
, you have
b = []
but again this is a lie, because actually
b = shortest(start_node, num_nodes, adjacencies)
So just write that.
distances = shortest(start_node, num_nodes, adjacencies)
For a
, initialize it as close to usage as possible and call it edges
.
If you write
for j in range(num_edges):
you're misleading me into thinking you're going to use the index. When you're not going to, write
for _ in range(num_edges):
And what's up with that, followed by
for j in edges:
followed by
for j in adjacencies:
Is j
just your general "this is a thing" variable? Most people use x
, elem
and item
for that. Instead, write
for _ in range(num_edges):
for edge in edges:
for adjacent in adjacencies:
The middle can be
for start, end in edges:
The comment
#edges are 6
Isn't coherent. Try
# Edges are length 6
Instead, though, don't put this information in shortest
- use 1
and multiply out when using the results.
Don't call variables temp
. It's a poor name.
You've double-indented the part where you update the neighbours.
Note that
num_nodes, num_edges = list(map(int, input().strip().split()))
is just
num_nodes, num_edges = map(int, input().strip().split())
Why do you do this?
for i in adjacencies[current-1]:
neigh = i
Just write
for neighbour in adjacencies[current-1]:
And the content can be
for neighbour in adjacencies[current-1]:
distance[neighbour-1] = min(distance[neighbour-1], distance[current-1] + 1)
Do you not think this whole -1
thing is getting a bit silly? When reading, just decrement the edge values and starting edge correctly:
edges = []
for _ in range(num_edges):
start, end = map(int, input().strip().split())
edges.append((start - 1, end - 1))
start_node = int(input()) - 1
while
loops and if
statements don't need their condition parenthesized, and operators should be spaced properly:
while (len(unique)>0): # before
while len(unique) > 0: # after
Then this is better as just while unique
.
Don't put spaces before the parentheses in function calls or indexing.
minDist
should be min_dist
. You can also start with it set to num_nodes
, and the same for distance
's initialization.
unique
is a horrible name, and has nothing to do with the variable's purpose. Try unvisited
instead. It can be initialized as
unvisited = [i for i, adj in enumerate(adjacencies) if adj]
distance
, unsurprisingly, does not mean distance
but instead distances
. Fix that.
This code:
min_dist = num_nodes
for i in unvisited:
if distances[i] <= min_dist:
min_dist = distances[i]
distIndex = i
current = distIndex
is just
min_dist = num_nodes
for i in unvisited:
if distances[i] <= min_dist:
min_dist = distances[i]
current = i
and current
should be called closest
. It is simplifiable to
closest = min(unvisited, key=lambda x: distances[x])
This is faster if you write
closest = min(unvisited, key=distances.__getitem__)
but the difference isn't vital.
One last thing with shortest
- rename it to shortest_distances
.
Put the rest of the code in a main
function.
If you write
def read_pair(decrement):
x, _, y = input().partition(" ")
return int(x) - decrement, int(y) - decrement
then you can initialize num_nodes
, num_edges
and edges
with
num_nodes, num_edges = read_pair(0)
edges = [read_pair(1) for i in range(num_edges)]
Note my use of partition
instead of split
as IMO it's a better description of the operation here.
Then
for adjacent in adjacencies:
adjacent = list(set(adjacent))
doesn't actually work! adjacent =
only affects the local scope! Instead, you want
adjacencies = [list(set(adj)) for adj in adjacencies]
which is even better as
adjacencies = [set() for _ in range(num_nodes)]
for start, end in edges:
adjacencies[start].add(end)
adjacencies[end].add(start)
as you never have the large intermediates then, and there's no real point converting back from a set.
This stuff:
distances.remove(0)
for i in range(len(distances)):
if distances[i] == num_nodes:
distances[i] = -1
else:
distances[i] *= 6
print(" ".join(str(e) for e in distances))
is just
print(*(-1 if i == num_nodes else i * 6 for i in distances if i))
Nice, that seems to have made it fast enough, but more important the code is more readable.
We can improve speed further by avoiding input
, using sys.stdin
directly to get buffering.
def main(lines):
def read_pair(decrement):
x, _, y = next(lines).partition(" ")
return int(x) - decrement, int(y) - decrement
...
main(sys.stdin)
Note that this change is sufficient on its own to beat the time limit: you can apply it to your original code to pass the test. Note also that you shouldn't mix input
with next
; at least on Python 2 this throws an error saying
ValueError: Mixing iteration and read methods would lose data
Though this error is gone in Python 3, that doesn't leave me feeling good about it.
I'd finish off with
from __future__ import print_function
to make it Python 2 compatible.
from __future__ import print_function
import sys
def shortest_distances(start_node, num_nodes, adjacencies):
distances = [num_nodes] * num_nodes
unvisited = [i for i, adj in enumerate(adjacencies) if adj]
distances[start_node] = 0
while unvisited:
closest = min(unvisited, key=distances.__getitem__)
unvisited.remove(closest)
# Update the distances of each neighbour
for neighbour in adjacencies[closest]:
distances[neighbour] = min(distances[neighbour], distances[closest] + 1)
return distances
def main(lines):
def read_pair(decrement):
x, _, y = next(lines).partition(" ")
return int(x) - decrement, int(y) - decrement
num_test_cases = int(next(lines))
for i in range(num_test_cases):
num_nodes, num_edges = read_pair(0)
edges = [read_pair(1) for i in range(num_edges)]
start_node = int(next(lines)) - 1
adjacencies = [set() for _ in range(num_nodes)]
for start, end in edges:
adjacencies[start].add(end)
adjacencies[end].add(start)
distances = shortest_distances(start_node, num_nodes, adjacencies)
print(*(-1 if i == num_nodes else i * 6 for i in distances if i))
main(sys.stdin)
-
\$\begingroup\$ Wow, thank you very much. This is very thorough and helpful. \$\endgroup\$Yifan– Yifan2016年08月18日 13:51:43 +00:00Commented Aug 18, 2016 at 13:51
Veedrac's answer has a lot of good points about the quality of the code, so in this answer I'm just going to discuss the performance.
Let's figure out where the time goes by profiling the program using Python's built-in profiler:
$ python -m cProfile -s tottime cr138989.py < testcase.txt
...
1492053 function calls in 3.915 seconds
Ordered by: internal time
ncalls tottime percall cumtime percall filename:lineno(function)
1 1.767 1.767 3.915 3.915 cr138989.py:1(<module>)
246223 1.346 0.000 1.348 0.000 {built-in method builtins.input}
6 0.343 0.057 0.364 0.061 cr138989.py:1(shortest)
246216 0.221 0.000 0.221 0.000 {method 'split' of 'str' objects}
741011 0.106 0.000 0.106 0.000 {method 'append' of 'list' objects}
246216 0.062 0.000 0.062 0.000 {method 'strip' of 'str' objects}
6 0.043 0.007 0.043 0.007 {built-in method builtins.print}
2387 0.020 0.000 0.020 0.000 {method 'remove' of 'list' objects}
3403 0.002 0.000 0.002 0.000 cr138989.py:53(<genexpr>)
6 0.001 0.000 0.001 0.000 cr138989.py:39(<listcomp>)
258 0.001 0.000 0.001 0.000 {built-in method _codecs.ascii_decode}
6 0.001 0.000 0.002 0.000 {method 'join' of 'str' objects}
5796 0.001 0.000 0.001 0.000 {built-in method builtins.len}
258 0.001 0.000 0.002 0.000 ascii.py:25(decode)
258 0.000 0.000 0.000 0.000 codecs.py:280(getstate)
1 0.000 0.000 3.915 3.915 {built-in method builtins.exec}
1 0.000 0.000 0.000 0.000 {method 'disable' of '_lsprof.Profiler' objects}
It takes some practice to interpret the results, but basically this is telling you that the total runtime was 3.9 seconds, of which 1.7 seconds was spent in the code at the top level of the module, 1.3 seconds was spent in input
, 0.3 in shortest
, and a small amount in other functions.
So the obvious performance problem here is input
. If you read the documentation, you'll see that this function has several features that you don't use (prompting, interactivity, newline stripping, EOF handling), and all this can easily be avoided by replacing input
with sys.stdin.readline
:
import sys
input = sys.stdin.readline
Now the profile results look like this:
ncalls tottime percall cumtime percall filename:lineno(function)
1 1.500 1.500 2.387 2.387 cr138989.py:1(<module>)
6 0.357 0.059 0.378 0.063 cr138989.py:1(shortest)
246216 0.151 0.000 0.151 0.000 {method 'split' of 'str' objects}
246223 0.140 0.000 0.142 0.000 {method 'readline' of '_io.TextIOWrapper' objects}
741011 0.103 0.000 0.103 0.000 {method 'append' of 'list' objects}
246216 0.058 0.000 0.058 0.000 {method 'strip' of 'str' objects}
6 0.052 0.009 0.052 0.009 {built-in method builtins.print}
2387 0.020 0.000 0.020 0.000 {method 'remove' of 'list' objects}
This is already fast enough to pass the HackerRank time limit. But you can save another 0.6 seconds or so by making these changes:
Drop the calls to
strip
: there's no need for them, becausesplit
with no arguments has the same effect.There's no need to construct the list
a
of edges: it's simpler just to construct the adjacency matrix directly from the input.There's no need to deduplicate the edges (duplicate edges don't affect the correctness of Dikjstra's algorithm), but if you are going to do it, it's simpler to make
adj
into a list of sets, and call theadd
method instead ofappend
so that the deduplication happens as you go along.
Explore related questions
See similar questions with these tags.
NameError: name 'node' is not defined
. \$\endgroup\$node, edge
line is not valid syntax elsewhere. When I change it to read to a list first and then assign node and edge, the next error is for mixing spaces and tabs (it didn't matter in the HackerRank terminal again, sorry). I have to go right now, but I will try to clean it as soon as I get back. 2. The first link to HackerRank has a Sample Input, Sample Output, and Explanation on the page. The second link is just the input of the problematic test case. \$\endgroup\$node
andedge
first (not what I originally thought), and I used a text editor to change the indentation to be consistent with spaces. Please try again. \$\endgroup\$adj
is length 0 on the first iteration. \$\endgroup\$adj =
and thenode, edge =
lines. \$\endgroup\$