9
\$\begingroup\$

I am doing a code challenge of codesignal.com which ask for following things: Given an array of integers, sort its elements by the difference of their largest and smallest digits. In the case of a tie, that with the larger index in the array should come first.

Example

For a = [152, 23, 7, 887, 243], the output should be
digitDifferenceSort(a) = [7, 887, 23, 243, 152].

Here are the differences of all the numbers:

152: difference = 5 - 1 = 4;
23: difference = 3 - 2 = 1;
7: difference = 7 - 7 = 0;
887: difference = 8 - 7 = 1;
243: difference = 4 - 2 = 2.

23 and 887 have the same difference, but 887 goes after 23 in a, so in the sorted array it comes first. I wrote following code using python3 and it passes all normal tests but it cannot pass execution time tests. How can I improve my code to decrease it's execution time for list with considerable amount of elements?

def digitDifferenceSort(a):
 diff = []
 for i in a:
 i = list(str(i))
 diff.append(i)
 for i in range(len(diff)):
 for j in range(i+1, len(diff)):
 if int(max(diff[i])) - int(min(diff[i])) > int(max(diff[j])) - int(min(diff[j])):
 diff[j], diff[i] = diff[i], diff[j]
 elif int(max(diff[i])) - int(min(diff[i])) == int(max(diff[j])) - int(min(diff[j])):
 diff[i], diff[j] = diff[j], diff[i] 
 new_list = [] 
 for i in diff:
 b = ''
 for j in i:
 b = b + j
 new_list.append(int(b))
 return new_list
Vogel612
25.5k7 gold badges59 silver badges141 bronze badges
asked Mar 11, 2019 at 10:01
\$\endgroup\$

2 Answers 2

14
\$\begingroup\$

Python has a built-in sorted function, you should use it. What it needs to sort according to some special criteria is a key function:

def max_digit_diff(n):
 n_str = str(n)
 return int(max(n_str)) - int(min(n_str))

This uses the fact that "0" < "1" < ... < "9".

However, the sorted function uses a stable sorting algorithm, so if two elements compare equal, the original order is preserved. But here we want the opposite order (later elements come first), so we just reverse the list first:

def digit_difference_sort(a):
 return sorted(reversed(a), key=max_digit_diff)

This should be vastly easier to read than your convoluted function. Note that the function names also follow Python's official style-guide, PEP8.

Like all (good) sorting functions, this is \$\mathcal{O}(n \log n)\$. Here is a timing comparison to your function with arrays up to length 10k (at which point your function takes more than a minute...).

enter image description here


Here is an implementation of the radix sort suggested by @JollyJoker in their answer:

from itertools import chain
def radix_sort(a):
 sub_a = [[] for _ in range(10)]
 for x in a:
 sub_a[max_digit_diff(x)].append(x)
 return list(chain.from_iterable(reversed(x) for x in sub_a))

This seems to have the same complexity as my approach, probably the implementation of max_digit_diff actually dominates this:

enter image description here

answered Mar 11, 2019 at 10:30
\$\endgroup\$
4
  • \$\begingroup\$ Thanks a lot. Would you please give more clarification about passing max_digit_diff method as a parameter to sorted function and how this part works? \$\endgroup\$ Commented Mar 11, 2019 at 10:57
  • 1
    \$\begingroup\$ @IbrahimRahimi: sorted calls the function you specify as a key exactly once for each input and sorts according to that. Have a look at the official Sorting HOW TO for more information. \$\endgroup\$ Commented Mar 11, 2019 at 11:00
  • 1
    \$\begingroup\$ Nice answer. Just curious, How did you generate the comparison graph? is there a package for doing it? \$\endgroup\$ Commented Mar 11, 2019 at 12:30
  • 1
    \$\begingroup\$ @gustavovelascoh: It is basically done with the code in my question here, with some input from the answers. One of these days I will finally make it look pretty and upload it to github... \$\endgroup\$ Commented Mar 11, 2019 at 13:56
3
\$\begingroup\$

You can do better than \$\mathcal{O}(n \log n)\$ using a Radix sort.

The differences can only have values 0-9, so you can sort the original array into a list of 10 lists while just going through the array once. Then, for each list 0-9, pop() the values into an output list until the list is empty.

answered Mar 11, 2019 at 14:11
\$\endgroup\$
6
  • \$\begingroup\$ Added an implementation of this to my answer and included it in the timings. Interestingly it is exactly the same as using the built-in sorted. Probably due to the fact that both use max_digit_diff. \$\endgroup\$ Commented Mar 11, 2019 at 14:30
  • \$\begingroup\$ @Graipher The scaling seems odd. Could you add some timing check before the return row just to check there's nothing slow in the last line? Then again, maybe sorted just is that good. \$\endgroup\$ Commented Mar 11, 2019 at 15:02
  • \$\begingroup\$ Weirdly, it looks exactly the same when directly returning sub_a. Also, max_digit_diff is basically constant time, obviously, since even the longest numbers have only a few digits (less than a hundred). \$\endgroup\$ Commented Mar 11, 2019 at 15:18
  • \$\begingroup\$ I also tried arrays with up to 10k elements, still the same (without the OPs algorithm, obviously). \$\endgroup\$ Commented Mar 11, 2019 at 15:27
  • 1
    \$\begingroup\$ @Graipher You're probably right on Timsort. BTW, good job on turning my answer into actual code :) I wasn't at all certain my text was clear enough. \$\endgroup\$ Commented Mar 11, 2019 at 15:48

Your Answer

Draft saved
Draft discarded

Sign up or log in

Sign up using Google
Sign up using Email and Password

Post as a guest

Required, but never shown

Post as a guest

Required, but never shown

By clicking "Post Your Answer", you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.