14
\$\begingroup\$

I don't really know how to explain what I'm looking for in a way that makes sense, but here goes:

Say I have a list $$L=(4,7,9,10,6,11,3)$$ What I want to produce is a corresponding list $$ K = (1,3,4,5,2,6,0)$$

where element K[i] has the value of the 'rank' of the element for the corresponding location in L. So the higher the number the higher the rank, starting from rank 0 for the smallest number in L

The code I have written for this is:

x = [4,7,9,10,6,11,3]
index = [0]*len(x)
for i in range(len(x)):
 index[x.index(min(x))] = i
 x[x.index(min(x))] = max(x)+1

and it works, but I just feel it looks horrible, and was wondering if there might exist a more aesthetic way.

200_success
145k22 gold badges190 silver badges478 bronze badges
asked Oct 7, 2014 at 22:37
\$\endgroup\$
3
  • 2
    \$\begingroup\$ Do we assume all list elements are distinct and hence you don't care about breaking ties? \$\endgroup\$ Commented Sep 10, 2018 at 8:58
  • \$\begingroup\$ If you can use numpy, this is implemented: np.array([4,7,9,10,6,11,3]).argsort() gives array([6, 0, 4, 1, 2, 3, 5]), which is by decreasing order. (Your use-case wants to rank by increasing order, in which case just negate the array before argsort'ing). NumPy is faster than native Python and mostly in C. \$\endgroup\$ Commented Apr 19, 2024 at 0:59
  • \$\begingroup\$ Also implemented both in scipy, and in pandas rank() \$\endgroup\$ Commented Apr 19, 2024 at 1:03

3 Answers 3

11
\$\begingroup\$

I believe your solution is \$O(n^2)\,ドル and we can reduce that to \$O(n \log n)\$. I'm not well-versed in python, but this is the general idea:

  • The output is a permutation of the indices \0,ドル 1, \ldots, n - 1\,ドル where \$n\$ is the length of the input.
  • We sort the indices based on the element at the specified index. In your example, we get \6,ドル 0, 4, 1, 2, 3, 5\,ドル i.e. the \6ドル\$th element (\3ドル\$) is smallest, so \6ドル\$ is first.
  • Seeing that \6ドル\$ is at index \0ドル\,ドル we know that the \6ドル\$th element of the output is \0ドル\,ドル and so on. This step is easier to explain in code than in words, sorry about that.

In code,

indices = list(range(len(input)))
indices.sort(key=lambda x: input[x])
output = [0] * len(indices)
for i, x in enumerate(indices):
 output[x] = i

Or the more terse

output = [0] * len(input)
for i, x in enumerate(sorted(range(len(input)), key=lambda y: input[y])):
 output[x] = i

I used the timeit module to compare the running times of this version and the original for different input sizes. The functions were each called 1,000 times on randomly shuffled input of size \$n\$. Here are some of the results

 n this version (s) original version (s)
 10 0.02 0.04
 100 0.17 1.40
1000 1.81 133.35

This is asymptotically optimal, since if it were not, we would have a sub-linearithmic comparison sort:

sorted = [0] * len(input)
for i, x in enumerate(output):
 sorted[x] = input[i]
answered Oct 8, 2014 at 0:34
\$\endgroup\$
10
\$\begingroup\$

If all of the numbers in x are unique, this works:

x = [4,7,9,10,6,11,3]
seq = sorted(x)
index = [seq.index(v) for v in x]

The technique is to sort the input list, then look up the position of each value from the original list in the sorted one, storing the results in a list via list comprehension.

It will have trouble if the numbers in x are non-unique, because when the list is sorted there will be two identical numbers next to each other and index() will find the first one. This might be beneficial, as technically the numbers are indeed the same rank, but it will also mean there is a "hole" in the ranking order (for example, if two numbers are tied for third, the fourth rank will actually be numbered 5 because it will be the 5th entry in the sorted list)

It also involves creating a sorted copy of the original list so may take up extra memory if the list is large.

answered Oct 7, 2014 at 22:56
\$\endgroup\$
0
3
\$\begingroup\$

You can make this easily (in python) by sorting twice: you first sort each element and its relative index (i.e. argsort), then you enumerate each new element and sort back the relative index.

This solution has same complexity of your sorting algorithm, therefore you can make it \$O(n\log n)\$ or even \$O(n)\$ if you have small integers and use, for example, radix sort.

In this case, I use the build-in sorted, which is \$O(n\log n)\,ドル and zip to get back only the list with ranks

Here's an example

L = [4, 7, 9, 10, 6, 11, 3]
K = (1, 3, 4, 5, 2, 6, 0)
g1 = lambda e: e[1]
g10 = lambda e: e[1][0]
ranks, _ = zip(*sorted(enumerate(sorted(enumerate(L), key=g1)), key=g10))
print(ranks == K) # True

Here's what is happening:

s1 = sorted(enumerate(L), key=lambda e: e[1])
print(s1)
# [(6, 3), (0, 4), (4, 6), (1, 7), (2, 9), (3, 10), (5, 11)]
s2 = sorted(enumerate(s1), key=lambda e: e[1][0])
print(s2)
# [(1, (0, 4)), (3, (1, 7)), (4, (2, 9)), (5, (3, 10)), (2, (4, 6)), (6, (5, 11)), (0, (6, 3))]
answered Jul 13, 2018 at 12:24
\$\endgroup\$

Your Answer

Draft saved
Draft discarded

Sign up or log in

Sign up using Google
Sign up using Email and Password

Post as a guest

Required, but never shown

Post as a guest

Required, but never shown

By clicking "Post Your Answer", you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.