0
$\begingroup$

I just thought this up while walking after typing up a knapsack with repetition implementation in Python.

I want to teach computer science so I talk to myself a lot pretending to talk to students.

So in my imaginary lecture for merge sort, I used the example of an unshuffled deck of cards.

But then it hit me that, "With cards, I know what high and low are so I can just have a grid that is 4 * 13 with spades as first row, then hearts, then clubs then diamonds".

That would take exactly O(n) where n = 52 to sort the pile if I pick a card and put it in its place since each operation takes O(1) + O(1)

So then I thought, why can't I do this for a sort of any array with the caveat being that all values need to be distinct.

I can find the minimum and maximum in O(n), remember the indices and truncate the input array by 2 by deleting these to make array called rem for remaining.

Then I make an array of size (maximum - minimum) as the grid with minimum at index 0 and maximum at array.length - 1 and initialize all values to some value outside the range of minimum to maximum.

Then I remember what minimum and maximum are for the following operations.

Iterate rem from index 0, and for each element, calculate: index = element_value - minimum

Then put the element in the grid at position index.

Repeat for the entire rem array.

Calculations are done in constant O(1) time as well as insertions for n times for total O(n).

Now make a new array final that has length of input array.

Now traverse the grid and transfer to array final all valid values that are not equal to the initialized value outside the range.

This could take a super long time if the difference between maximum and minimum far exceeds input size n so when I do maximum - minimum, I have to set some sort of upper bound to this difference before it defaults to an O(nlogn) merge sort or an even faster Tim sort that is built into Python.

But say the initial range is min = 0, max 20 and n = 10. That final operation would take O(2n) which is still big-O of O(n). If the range is min = 0, max =100 and n =10, it would take O(10n) which still defaults to O(n)?
Calculating O(nlogn) for the last example, I get:

(10) * log_2(10) = approx. 35?

Which means max-min <= 35 I can match the speed of merge sort?

Edit:

Even if there are repeated values in the input array, I could have have a 2D array for the grid or a list of tuples in the case of Python and keep adding to it.

tup = tup + (repeat_val,)

And for an input array with consecutive values, this would be way faster wouldn't it?

asked Apr 25, 2021 at 8:25
$\endgroup$

1 Answer 1

2
$\begingroup$

It looks like the algorithm you are describing is called counting sort.

This algorithm works in $O(n+k)$ where $n$ is the number of items in the input, and $k$ would be their range: $k=max - min$.

This algorithm is indeed better in computational time complexity than mergesort when you restrict $k$ to be small enough. For your examples, you "fixed" a value for $n$ and $k$, which makes no sense to compute the big-O notation on. The big-O describes the algorithm in its full behaviour, and is an upper bound for any "fixed" value you would give.

As a side note, this algorithm is not a comparison-based sorting (sorting only by comparing numbers), and can only sort natural numbers. Actual comparison-based sorting algorithms (like mergesort) must take at least $\Omega(n\log(n))$, but are capable of sorting any list, and not just natural numbers.

Juho
22.9k7 gold badges63 silver badges118 bronze badges
answered Apr 25, 2021 at 9:27
$\endgroup$
3
  • $\begingroup$ Thanks for an awesome answer. Yep, my big-o calculated was not worst-case at all. Is it really restricted to natural numbers even with the element - min to find index calculation? $\endgroup$ Commented Apr 25, 2021 at 15:59
  • $\begingroup$ Unless you have some mapping $\phi$ to the natural numbers (with $O(1)$ time to calculate), this (specific) algorithm won't work. I think there are algorithms with different assumptions, that can work with other types of data. But a for general type of data with no assumptions on the input, the best you can hope for is $\Theta(n\log(n))$ $\endgroup$ Commented Apr 25, 2021 at 16:13
  • $\begingroup$ so basically with restricted input, I can still at least run a check to see if counting sort is indeed faster than merge sort for the input. In O(n) I can check whether the input is all natural numbers. $\endgroup$ Commented Apr 25, 2021 at 21:47

Your Answer

Draft saved
Draft discarded

Sign up or log in

Sign up using Google
Sign up using Email and Password

Post as a guest

Required, but never shown

Post as a guest

Required, but never shown

By clicking "Post Your Answer", you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.