HackerRank Algorithm Problem: Climbing the Leaderboard (Python)

Question 1

Here is the Hackerrank problem which is related to dense ranking and below is my solution which didn't pass the time limit cases. Any better way to optimize this?

def climbingLeaderboard(ranked, player):
 ranked_set = sorted(list(set(ranked)), reverse=True)
 result = []
 print(ranked_set)
 for p in player:
 appended = False
 for i in range(len(ranked_set)):
 if p >= ranked_set[i]:
 result.append(i+1)
 appended = True
 break
 if appended == False:
 result.append(len(ranked_set)+1)
 return result

Question 2

My approach, described below, should definitely be an improvement. I have named my version climbing_leaderboard so as to not conflict with your name (I have included your version in mine in a benchmark to measure the performances of each), but also to be more in conformance with the PEP 8 Style Guide standard form naming functions.

General Suggestions for Improvement

It's always a good idea to provide a docstring describing what the function is supposed to be doing. Either use typing to describe the function's arguments and return value or provide that information in the docstring itself. Enough said.

Algorithmic Improvement

For each element p of the player argument, you are searching sequentially through your ranked_set list to find the new ranking for p. For small values of p you might be searching almost every element of ranked_set to accomplish that since ranked_set is sorted in descending order. Also, ranked_set is not the best name for a variable that references a list.

A better approach would be to use a binary search to accomplish this. To discover any given ranking should not require examining more than approximately Log₂(N) elements where N is the number of elements in ranked_set.

In the following code I have included your version (less print statements) and mine and print out the times for my desktop.

from typing import List
import time
timer = time.perf_counter
def climbingLeaderboard(ranked, player):
 """Solve the problem descibed at:
 https://www.hackerrank.com/challenges/climbing-the-leaderboard/problem?isFullScreen=true"""
 # Remove duplicate scores:
 ranked_list = sorted(list(set(ranked)), reverse=True)
 result = []
 last_index = len(ranked_list) - 1
 for p in player:
 start = 0
 end = last_index
 while start <= end:
 mid = (start + end) // 2
 score = ranked_list[mid]
 if p > score:
 end = mid - 1
 elif p < score:
 start = mid + 1
 else:
 break
 if start > end:
 # No exact hit
 rank = mid + 1 if p > score else mid + 2
 else:
 rank = mid + 1
 result.append(rank)
 return result
ranked = list(range(100_000, 1, -3))
player = list(range(10, 100_030, 20))
t0 = timer()
result1 = climbingLeaderboard(ranked, player)
t = timer() - t0
print(t)
t0 = timer()
result2 = climbing_leaderboard(ranked, player)
t = timer() - t0
print(t)
assert result1 == result2

Prints:

3.8725043
0.01452220000000004

Update

The following is a benchmark of 3 methods of initializing the ranked_list variable from the input so as to remove duplicate values when they exist. For the first run there are 30,000 distinct values in descending order and for the second run there are 15,000 distinct values each duplicated once.

Method 1: As in the above code.
Method 2: The input is stepped through element by element and if the element is not the same value as the previous element, it is appended to the initially empty ranked_list.
itertools.groupby is used as suggested by vnp.

The code:

import itertools
import time
timer = time.perf_counter
def benchmark():
 N = 1_000
 for run in (1, 2):
 if run == 1:
 # No duplicate values
 ranked = list(range(30_000, 0, -1))
 else:
 # Two of each value:
 ranked = []
 for n in range(15_000, 0, -1):
 ranked.append(n)
 ranked.append(n)
 t = timer()
 for _ in range(N):
 ranked_list = sorted(list(set(ranked)), reverse=True)
 t1 = timer() - t
 t = timer()
 for _ in range(N):
 ranked_list = []
 last = None
 for r in ranked:
 if r != last:
 last = r
 ranked_list.append(r)
 t2 = timer() - t
 t = timer()
 for _ in range(N):
 ranked_list = [k for k, _ in itertools.groupby(ranked)]
 t3 = timer() - t
 print(t1, t2, t3)
benchmark()

The output:

1.9442110000000001 2.5163936 3.0017806
0.7499568000000005 2.2753618000000007 1.654442699999998

Conclusion

Method 1 is the fastest in both cases. With no duplicates in the input, method 3 is the slowest. When there are duplicates, method 2 is the slowest.

Question 3

ranked_list = sorted(list(set(ranked)), reverse=True) is somewhat suboptimal. It ignores the fact that ranked is already sorted.. Considetrranked_list = [k for k, _ in itertools.groupby(ranked)].

Question 4

@vnp You are probably correct. I say that because in my original code I looped successively through all the elements of the ranked list argument and if it was not equal to the previous element it was added to the initially empty ranked_list. At least for the input I used in the benchmark above there seemed to be no appreciable difference in timings for that method, your method using itertools.groupby and the OP's original code, which I decided to retain. I would have to do a proper benchmark on just this code fragment executing many repetitions to know for sure.

Question 5

@vpn I added a benchmark to see which of 3 methods of initializing ranked_list is the fastest (at least for the input I have chosen). The method used in the code, which involves creating a set to remove duplicates and then re-sorting, is the fastest whether there are duplicate values in the input or not. Have a look.

Question 6

It's a good idea not to go through the whole ranked_list to rank a player but to start from the middle.

Question 7

@peternish That is what a binary search does. That is what my code does. I am not sure what your point is.

Booboo Booboo 2,9713 silver badges13 bronze badges · Answer 1 · 2023-11-24 15:54:04Z

My approach, described below, should definitely be an improvement. I have named my version climbing_leaderboard so as to not conflict with your name (I have included your version in mine in a benchmark to measure the performances of each), but also to be more in conformance with the PEP 8 Style Guide standard form naming functions.

General Suggestions for Improvement

It's always a good idea to provide a docstring describing what the function is supposed to be doing. Either use typing to describe the function's arguments and return value or provide that information in the docstring itself. Enough said.

Algorithmic Improvement

For each element p of the player argument, you are searching sequentially through your ranked_set list to find the new ranking for p. For small values of p you might be searching almost every element of ranked_set to accomplish that since ranked_set is sorted in descending order. Also, ranked_set is not the best name for a variable that references a list.

A better approach would be to use a binary search to accomplish this. To discover any given ranking should not require examining more than approximately Log₂(N) elements where N is the number of elements in ranked_set.

In the following code I have included your version (less print statements) and mine and print out the times for my desktop.

from typing import List
import time
timer = time.perf_counter
def climbingLeaderboard(ranked, player):
 """Solve the problem descibed at:
 https://www.hackerrank.com/challenges/climbing-the-leaderboard/problem?isFullScreen=true"""
 # Remove duplicate scores:
 ranked_list = sorted(list(set(ranked)), reverse=True)
 result = []
 last_index = len(ranked_list) - 1
 for p in player:
 start = 0
 end = last_index
 while start <= end:
 mid = (start + end) // 2
 score = ranked_list[mid]
 if p > score:
 end = mid - 1
 elif p < score:
 start = mid + 1
 else:
 break
 if start > end:
 # No exact hit
 rank = mid + 1 if p > score else mid + 2
 else:
 rank = mid + 1
 result.append(rank)
 return result
ranked = list(range(100_000, 1, -3))
player = list(range(10, 100_030, 20))
t0 = timer()
result1 = climbingLeaderboard(ranked, player)
t = timer() - t0
print(t)
t0 = timer()
result2 = climbing_leaderboard(ranked, player)
t = timer() - t0
print(t)
assert result1 == result2

Prints:

3.8725043
0.01452220000000004

Update

The following is a benchmark of 3 methods of initializing the ranked_list variable from the input so as to remove duplicate values when they exist. For the first run there are 30,000 distinct values in descending order and for the second run there are 15,000 distinct values each duplicated once.

Method 1: As in the above code.
Method 2: The input is stepped through element by element and if the element is not the same value as the previous element, it is appended to the initially empty ranked_list.
itertools.groupby is used as suggested by vnp.

The code:

import itertools
import time
timer = time.perf_counter
def benchmark():
 N = 1_000
 for run in (1, 2):
 if run == 1:
 # No duplicate values
 ranked = list(range(30_000, 0, -1))
 else:
 # Two of each value:
 ranked = []
 for n in range(15_000, 0, -1):
 ranked.append(n)
 ranked.append(n)
 t = timer()
 for _ in range(N):
 ranked_list = sorted(list(set(ranked)), reverse=True)
 t1 = timer() - t
 t = timer()
 for _ in range(N):
 ranked_list = []
 last = None
 for r in ranked:
 if r != last:
 last = r
 ranked_list.append(r)
 t2 = timer() - t
 t = timer()
 for _ in range(N):
 ranked_list = [k for k, _ in itertools.groupby(ranked)]
 t3 = timer() - t
 print(t1, t2, t3)
benchmark()

The output:

1.9442110000000001 2.5163936 3.0017806
0.7499568000000005 2.2753618000000007 1.654442699999998

Conclusion

Method 1 is the fastest in both cases. With no duplicates in the input, method 3 is the slowest. When there are duplicates, method 2 is the slowest.

ranked_list = sorted(list(set(ranked)), reverse=True) is somewhat suboptimal. It ignores the fact that ranked is already sorted.. Considetrranked_list = [k for k, _ in itertools.groupby(ranked)].
@vnp You are probably correct. I say that because in my original code I looped successively through all the elements of the ranked list argument and if it was not equal to the previous element it was added to the initially empty ranked_list. At least for the input I used in the benchmark above there seemed to be no appreciable difference in timings for that method, your method using itertools.groupby and the OP's original code, which I decided to retain. I would have to do a proper benchmark on just this code fragment executing many repetitions to know for sure.
@vpn I added a benchmark to see which of 3 methods of initializing ranked_list is the fastest (at least for the input I have chosen). The method used in the code, which involves creating a set to remove duplicates and then re-sorting, is the fastest whether there are duplicate values in the input or not. Have a look.
It's a good idea not to go through the whole ranked_list to rank a player but to start from the middle.
@peternish That is what a binary search does. That is what my code does. I am not sure what your point is.

Stack Exchange Network

HackerRank Algorithm Problem: Climbing the Leaderboard (Python)

1 Answer 1

General Suggestions for Improvement

Algorithmic Improvement

Update

Your Answer

Sign up or log in

Post as a guest

Post as a guest

Hot Network Questions

HackerRank Algorithm Problem: Climbing the Leaderboard (Python)

1 Answer 1

General Suggestions for Improvement

Algorithmic Improvement

Update

Your Answer

Sign up or log in

Post as a guest

Post as a guest

Related

Hot Network Questions