Here is the Hackerrank problem which is related to dense ranking and below is my solution which didn't pass the time limit cases. Any better way to optimize this?
def climbingLeaderboard(ranked, player):
ranked_set = sorted(list(set(ranked)), reverse=True)
result = []
print(ranked_set)
for p in player:
appended = False
for i in range(len(ranked_set)):
if p >= ranked_set[i]:
result.append(i+1)
appended = True
break
if appended == False:
result.append(len(ranked_set)+1)
return result
1 Answer 1
My approach, described below, should definitely be an improvement. I have named my version climbing_leaderboard
so as to not conflict with your name (I have included your version in mine in a benchmark to measure the performances of each), but also to be more in conformance with the PEP 8 Style Guide standard form naming functions.
General Suggestions for Improvement
It's always a good idea to provide a docstring describing what the function is supposed to be doing. Either use typing to describe the function's arguments and return value or provide that information in the docstring itself. Enough said.
Algorithmic Improvement
For each element p
of the player argument, you are searching sequentially through your ranked_set
list to find the new ranking for p
. For small values of p
you might be searching almost every element of ranked_set
to accomplish that since ranked_set
is sorted in descending order. Also, ranked_set
is not the best name for a variable that references a list.
A better approach would be to use a binary search to accomplish this. To discover any given ranking should not require examining more than approximately Log2(N) elements where N is the number of elements in ranked_set
.
In the following code I have included your version (less print
statements) and mine and print out the times for my desktop.
from typing import List
import time
timer = time.perf_counter
def climbingLeaderboard(ranked, player):
"""Solve the problem descibed at:
https://www.hackerrank.com/challenges/climbing-the-leaderboard/problem?isFullScreen=true"""
# Remove duplicate scores:
ranked_list = sorted(list(set(ranked)), reverse=True)
result = []
last_index = len(ranked_list) - 1
for p in player:
start = 0
end = last_index
while start <= end:
mid = (start + end) // 2
score = ranked_list[mid]
if p > score:
end = mid - 1
elif p < score:
start = mid + 1
else:
break
if start > end:
# No exact hit
rank = mid + 1 if p > score else mid + 2
else:
rank = mid + 1
result.append(rank)
return result
ranked = list(range(100_000, 1, -3))
player = list(range(10, 100_030, 20))
t0 = timer()
result1 = climbingLeaderboard(ranked, player)
t = timer() - t0
print(t)
t0 = timer()
result2 = climbing_leaderboard(ranked, player)
t = timer() - t0
print(t)
assert result1 == result2
Prints:
3.8725043
0.01452220000000004
Update
The following is a benchmark of 3 methods of initializing the ranked_list
variable from the input so as to remove duplicate values when they exist. For the first run there are 30,000 distinct values in descending order and for the second run there are 15,000 distinct values each duplicated once.
- Method 1: As in the above code.
- Method 2: The input is stepped through element by element and if the element is not the same value as the previous element, it is appended to the initially empty
ranked_list
. itertools.groupby
is used as suggested by vnp.
The code:
import itertools
import time
timer = time.perf_counter
def benchmark():
N = 1_000
for run in (1, 2):
if run == 1:
# No duplicate values
ranked = list(range(30_000, 0, -1))
else:
# Two of each value:
ranked = []
for n in range(15_000, 0, -1):
ranked.append(n)
ranked.append(n)
t = timer()
for _ in range(N):
ranked_list = sorted(list(set(ranked)), reverse=True)
t1 = timer() - t
t = timer()
for _ in range(N):
ranked_list = []
last = None
for r in ranked:
if r != last:
last = r
ranked_list.append(r)
t2 = timer() - t
t = timer()
for _ in range(N):
ranked_list = [k for k, _ in itertools.groupby(ranked)]
t3 = timer() - t
print(t1, t2, t3)
benchmark()
The output:
1.9442110000000001 2.5163936 3.0017806
0.7499568000000005 2.2753618000000007 1.654442699999998
Conclusion
Method 1 is the fastest in both cases. With no duplicates in the input, method 3 is the slowest. When there are duplicates, method 2 is the slowest.
-
1\$\begingroup\$
ranked_list = sorted(list(set(ranked)), reverse=True)
is somewhat suboptimal. It ignores the fact thatranked
is already sorted.. Considetrranked_list = [k for k, _ in itertools.groupby(ranked)]
. \$\endgroup\$vnp– vnp2023年11月24日 21:27:53 +00:00Commented Nov 24, 2023 at 21:27 -
\$\begingroup\$ @vnp You are probably correct. I say that because in my original code I looped successively through all the elements of the ranked list argument and if it was not equal to the previous element it was added to the initially empty
ranked_list
. At least for the input I used in the benchmark above there seemed to be no appreciable difference in timings for that method, your method usingitertools.groupby
and the OP's original code, which I decided to retain. I would have to do a proper benchmark on just this code fragment executing many repetitions to know for sure. \$\endgroup\$Booboo– Booboo2023年11月24日 21:56:18 +00:00Commented Nov 24, 2023 at 21:56 -
\$\begingroup\$ @vpn I added a benchmark to see which of 3 methods of initializing
ranked_list
is the fastest (at least for the input I have chosen). The method used in the code, which involves creating a set to remove duplicates and then re-sorting, is the fastest whether there are duplicate values in the input or not. Have a look. \$\endgroup\$Booboo– Booboo2023年11月25日 00:07:02 +00:00Commented Nov 25, 2023 at 0:07 -
\$\begingroup\$ It's a good idea not to go through the whole ranked_list to rank a player but to start from the middle. \$\endgroup\$peternish– peternish2023年11月25日 08:27:17 +00:00Commented Nov 25, 2023 at 8:27
-
\$\begingroup\$ @peternish That is what a binary search does. That is what my code does. I am not sure what your point is. \$\endgroup\$Booboo– Booboo2023年11月25日 10:25:58 +00:00Commented Nov 25, 2023 at 10:25
Explore related questions
See similar questions with these tags.