my new sorting algorithm, what should my upper bound be?

Question 1

I just thought this up while walking after typing up a knapsack with repetition implementation in Python.

I want to teach computer science so I talk to myself a lot pretending to talk to students.

So in my imaginary lecture for merge sort, I used the example of an unshuffled deck of cards.

But then it hit me that, "With cards, I know what high and low are so I can just have a grid that is 4 * 13 with spades as first row, then hearts, then clubs then diamonds".

That would take exactly O(n) where n = 52 to sort the pile if I pick a card and put it in its place since each operation takes O(1) + O(1)

So then I thought, why can't I do this for a sort of any array with the caveat being that all values need to be distinct.

I can find the minimum and maximum in O(n), remember the indices and truncate the input array by 2 by deleting these to make array called rem for remaining.

Then I make an array of size (maximum - minimum) as the grid with minimum at index 0 and maximum at array.length - 1 and initialize all values to some value outside the range of minimum to maximum.

Then I remember what minimum and maximum are for the following operations.

Iterate rem from index 0, and for each element, calculate: index = element_value - minimum

Then put the element in the grid at position index.

Repeat for the entire rem array.

Calculations are done in constant O(1) time as well as insertions for n times for total O(n).

Now make a new array final that has length of input array.

Now traverse the grid and transfer to array final all valid values that are not equal to the initialized value outside the range.

This could take a super long time if the difference between maximum and minimum far exceeds input size n so when I do maximum - minimum, I have to set some sort of upper bound to this difference before it defaults to an O(nlogn) merge sort or an even faster Tim sort that is built into Python.

But say the initial range is min = 0, max 20 and n = 10. That final operation would take O(2n) which is still big-O of O(n). If the range is min = 0, max =100 and n =10, it would take O(10n) which still defaults to O(n)?
Calculating O(nlogn) for the last example, I get:

(10) * log_2(10) = approx. 35?

Which means max-min <= 35 I can match the speed of merge sort?

Edit:

Even if there are repeated values in the input array, I could have have a 2D array for the grid or a list of tuples in the case of Python and keep adding to it.

tup = tup + (repeat_val,)

And for an input array with consecutive values, this would be way faster wouldn't it?

Question 2

It looks like the algorithm you are describing is called counting sort.

This algorithm works in $O(n+k)$ where $n$ is the number of items in the input, and $k$ would be their range: $k=max - min$.

This algorithm is indeed better in computational time complexity than mergesort when you restrict $k$ to be small enough. For your examples, you "fixed" a value for $n$ and $k$, which makes no sense to compute the big-O notation on. The big-O describes the algorithm in its full behaviour, and is an upper bound for any "fixed" value you would give.

As a side note, this algorithm is not a comparison-based sorting (sorting only by comparing numbers), and can only sort natural numbers. Actual comparison-based sorting algorithms (like mergesort) must take at least $\Omega(n\log(n))$, but are capable of sorting any list, and not just natural numbers.

Question 3

Thanks for an awesome answer. Yep, my big-o calculated was not worst-case at all. Is it really restricted to natural numbers even with the element - min to find index calculation?

Question 4

Unless you have some mapping $\phi$ to the natural numbers (with $O(1)$ time to calculate), this (specific) algorithm won't work. I think there are algorithms with different assumptions, that can work with other types of data. But a for general type of data with no assumptions on the input, the best you can hope for is $\Theta(n\log(n))$

Question 5

so basically with restricted input, I can still at least run a check to see if counting sort is indeed faster than merge sort for the input. In O(n) I can check whether the input is all natural numbers.

nir shahar nir shahar 11.8k3 gold badges17 silver badges35 bronze badges · Accepted Answer · 2021-04-25 09:27:35Z

It looks like the algorithm you are describing is called counting sort.

This algorithm works in $O(n+k)$ where $n$ is the number of items in the input, and $k$ would be their range: $k=max - min$.

This algorithm is indeed better in computational time complexity than mergesort when you restrict $k$ to be small enough. For your examples, you "fixed" a value for $n$ and $k$, which makes no sense to compute the big-O notation on. The big-O describes the algorithm in its full behaviour, and is an upper bound for any "fixed" value you would give.

As a side note, this algorithm is not a comparison-based sorting (sorting only by comparing numbers), and can only sort natural numbers. Actual comparison-based sorting algorithms (like mergesort) must take at least $\Omega(n\log(n))$, but are capable of sorting any list, and not just natural numbers.

Thanks for an awesome answer. Yep, my big-o calculated was not worst-case at all. Is it really restricted to natural numbers even with the element - min to find index calculation?
Unless you have some mapping $\phi$ to the natural numbers (with $O(1)$ time to calculate), this (specific) algorithm won't work. I think there are algorithms with different assumptions, that can work with other types of data. But a for general type of data with no assumptions on the input, the best you can hope for is $\Theta(n\log(n))$
so basically with restricted input, I can still at least run a check to see if counting sort is indeed faster than merge sort for the input. In O(n) I can check whether the input is all natural numbers.

Stack Exchange Network

my new sorting algorithm, what should my upper bound be?

1 Answer 1

Your Answer

Sign up or log in

Post as a guest

Post as a guest

Hot Network Questions

my new sorting algorithm, what should my upper bound be?

1 Answer 1

Your Answer

Sign up or log in

Post as a guest

Post as a guest

Related

Hot Network Questions