CLRS Exercise 2.3-4: Recursive Insertion Sort

Question 1

I'm going through CLRS 3e and executing exercises to obtain a better understanding of CS fundamentals.

Below is my solution to the following exercise from chapter 2:

We can express insertion sort as a recursive procedure as follows. In order to sort \$A[*1..n*]\,ドル we recursively sort \$A[*1..n-1*]\$ and then insert \$A[*n*]\$ into the sorted array \$A[*1..n-1*]\$. Write a recurrence for the running time of this recursive version of insertion sort.

I was able to come up with a recurrence of:

T(N) = O(1) if N <= 1; T(N-1) + O(N) if N > 1

NOTE: The \$O\$ is supposed to represent theta notation.

import random
def recursive_insertion_sort(A, n):
 if n < 1:
 return A
 recursive_insertion_sort(A, n - 1)
 key = A[n]
 while n > 0 and key < A[n - 1]:
 A[n] = A[n - 1]
 n -= 1
 A[n] = key
 return A
A = [3, 1, 0, 2, 3]
B = random.sample(range(1, 11), 10)
C = random.sample(range(1, 11), 10)
D = random.sample(range(1, 11), 10)
E = [1]
F = []
assert recursive_insertion_sort(A, len(A) - 1) == sorted(A)
assert recursive_insertion_sort(B, len(B) - 1) == sorted(B)
assert recursive_insertion_sort(C, len(C) - 1) == sorted(C)
assert recursive_insertion_sort(D, len(D) - 1) == sorted(D)
assert recursive_insertion_sort(E, len(E) - 1) == sorted(E)
assert recursive_insertion_sort(F, len(F) - 1) == sorted(F)

I struggle with translating recursive algorithms to code. By my understanding this code is a correct recursive insertion sort due to the invariant that the array at \$A[0.. n - 1]\$ is a sorted array of the first \$n - 1\$ elements of the entire array \$A[0...n]\$.

I do not have a formal CS background so I'm unsure if this invariant holds up to more rigorous analysis. Furthermore if anyone has suggestions to how I can make this code cleaner that would be appreciated. As an aside I am trying to right this code w/o the use of built in functions aside from very primitive (ex. len()) functions.

Question 2

The implementation looks fine to me, but writing the recurrence with a big theta isn't right. The problem is that the time taken to insert item \$n\$ into the right place among items 1 to \$n-1\$ depends on the values of the items — if item \$n\$ is bigger than all of items 1 to \$n-1\$ then it takes constant time to "insert" it. What we can say is that in the worst case it takes time proportional to \$n\$ to insert the item.

Let's go back and try again from first principles. If you look at the analysis of the iterative version of the insertion sort algorithm (on pages 25–6 of CLRS) you'll see that the approach the authors take is to annotate each line of code with a cost and number of times it is executed. So let's do that for the code in the post:

 COST TIMES
-------------------------------------- ------ ------
def recursive_insertion_sort(A, n):
 if n < 1: c1 1
 return A
 recursive_insertion_sort(A, n - 1) T(n-1) 1
 key = A[n] c2 1
 while n > 0 and key < A[n - 1]: c3 t(n) + 1
 A[n] = A[n - 1] c4 t(n)
 n -= 1 c5 t(n)
 A[n] = key c6 1
 return A c7 1

Here \$t(n)\$ is the number of times that the while loop needs to be iterated when the function is called with the value \$n\$. This depends on the input A, so we might need to make separate worst-case and best-case analyses (as you can see the authors do for the iterative version on page 26).

So we have $$ T(n) = T(n-1) + c_1 + c_2 + c_3 + c_6 + c_7 + (c_3 + c_4 + c_5)t(n) $$ Now, in the best case, the input is already sorted, and so the while loop is never executed, and so \$t(n) = 0\,ドル hence $$ T(n) \ge T(n-1) + c_1 + c_2 + c_3 + c_6 + c_7 $$ and following similar analysis to page 26, we find $$ T(n) \ge (c_1 + c_2 + c_3 + c_6 + c_7)n $$ and so \$T(n) = \Omega(n)\$.

In the worst case, the input is sorted in reverse order, and so \$t(n) = n-1\,ドル hence $$ T(n) \le T(n-1) + c_1 + c_2 + c_3 + c_6 + c_7 + (n-1)(c_3 + c_4 + c_5) $$ and again following similar analysis to page 27, we find $$ \eqalign{ T(n) &\le (c_1 + c_2 + c_3 + c_6 + c_7)n + {c_3 + c_4 + c_5\over2}n(n-1) \\ &= {c_3 + c_4 + c_5\over2}n^2 + \left(c_1 + c_2 + c_3 + c_6 + c_7 - {c_3\over2} - {c_4\over2} - {c_5\over2}\right)n} $$ and so \$T(n) = O(n^2)\$.

So this is an algorithm for which we can't find a big-Theta expression for the runtime — it's bounded below by \$ \Omega(n) \$ and above by \$O(n^2)\$ and these do not meet in the middle.

(There are various ways to take shortcuts in this kind of analysis, especially if you are only interested in the asymptotic result, but when you are starting out, it's good to go through all the details so that you have confidence you're not missing anything.)

score 3 · Accepted Answer · 2017-07-11 10:49:14Z

The implementation looks fine to me, but writing the recurrence with a big theta isn't right. The problem is that the time taken to insert item \$n\$ into the right place among items 1 to \$n-1\$ depends on the values of the items — if item \$n\$ is bigger than all of items 1 to \$n-1\$ then it takes constant time to "insert" it. What we can say is that in the worst case it takes time proportional to \$n\$ to insert the item.

Let's go back and try again from first principles. If you look at the analysis of the iterative version of the insertion sort algorithm (on pages 25–6 of CLRS) you'll see that the approach the authors take is to annotate each line of code with a cost and number of times it is executed. So let's do that for the code in the post:

 COST TIMES
-------------------------------------- ------ ------
def recursive_insertion_sort(A, n):
 if n < 1: c1 1
 return A
 recursive_insertion_sort(A, n - 1) T(n-1) 1
 key = A[n] c2 1
 while n > 0 and key < A[n - 1]: c3 t(n) + 1
 A[n] = A[n - 1] c4 t(n)
 n -= 1 c5 t(n)
 A[n] = key c6 1
 return A c7 1

Here \$t(n)\$ is the number of times that the while loop needs to be iterated when the function is called with the value \$n\$. This depends on the input A, so we might need to make separate worst-case and best-case analyses (as you can see the authors do for the iterative version on page 26).

So we have $$ T(n) = T(n-1) + c_1 + c_2 + c_3 + c_6 + c_7 + (c_3 + c_4 + c_5)t(n) $$ Now, in the best case, the input is already sorted, and so the while loop is never executed, and so \$t(n) = 0\,ドル hence $$ T(n) \ge T(n-1) + c_1 + c_2 + c_3 + c_6 + c_7 $$ and following similar analysis to page 26, we find $$ T(n) \ge (c_1 + c_2 + c_3 + c_6 + c_7)n $$ and so \$T(n) = \Omega(n)\$.

In the worst case, the input is sorted in reverse order, and so \$t(n) = n-1\,ドル hence $$ T(n) \le T(n-1) + c_1 + c_2 + c_3 + c_6 + c_7 + (n-1)(c_3 + c_4 + c_5) $$ and again following similar analysis to page 27, we find $$ \eqalign{ T(n) &\le (c_1 + c_2 + c_3 + c_6 + c_7)n + {c_3 + c_4 + c_5\over2}n(n-1) \\ &= {c_3 + c_4 + c_5\over2}n^2 + \left(c_1 + c_2 + c_3 + c_6 + c_7 - {c_3\over2} - {c_4\over2} - {c_5\over2}\right)n} $$ and so \$T(n) = O(n^2)\$.

So this is an algorithm for which we can't find a big-Theta expression for the runtime — it's bounded below by \$ \Omega(n) \$ and above by \$O(n^2)\$ and these do not meet in the middle.

(There are various ways to take shortcuts in this kind of analysis, especially if you are only interested in the asymptotic result, but when you are starting out, it's good to go through all the details so that you have confidence you're not missing anything.)

Stack Exchange Network

CLRS Exercise 2.3-4: Recursive Insertion Sort

1 Answer 1

Your Answer

Sign up or log in

Post as a guest

Post as a guest

Linked

Hot Network Questions

CLRS Exercise 2.3-4: Recursive Insertion Sort

1 Answer 1

Your Answer

Sign up or log in

Post as a guest

Post as a guest

Linked

Related

Hot Network Questions