Insertion Sort in C++

Question 1

I have created an Insertion Sort code that sorts numbers (obviously).

#include <iostream>
#include <cstdlib>
void insertion_sort(int arr[], int length);
void print_array(int array[],int size);
int main()
{
 int array[] = {4,6,3,7,5,9,2,8,1,10};
 insertion_sort(array,10);
 print_array(array,10);
 return 0;
}
void insertion_sort(int arr[], int length)
{
 int i,j,tmp;
 for (i = 1; i < length; i++) {
 j = i;
 while (j > 0 && arr[j - 1] > arr[j]) {
 tmp = arr[j];
 arr[j] = arr[j - 1];
 arr[j - 1] = tmp;
 j--;
 }
 }
}
void print_array(int array[], int size)
{
 int j;
 for (j = 0; j < size; j++)
 {
 std::cout << array[j] << std::endl;
 }
}

Is there a way to make it shorter or better?

Question 2

Aside from the use of std::cout and std::endl this code is plain old C.

Question 3

@PeteBecker It is?

Question 4

Detailed descriptions could be found at Insertion Sort in C++

Question 5

A couple of things:

int i,j,tmp;

can be formatted better:

int i, j, tmp;

Even that would be discouraged, in two ways:

Declare variables where you need them, not before.
Multiple declarations on one line isn't very readable.

and actually isn't even necessary, as is explained later.

int j;
for (j = 0; j < size; j++)
{
 std::cout << array[j] << std::endl;
}

C++ conventions have braces on the same line as the statement:

int j;
for (j = 0; j < size; j++) {
 std::cout << array[j] << std::endl;
}

In fact, the first line can be merged into the loop:

for (int j = 0; j < size; j++) {
 std::cout << array[j] << std::endl;
}

j = i;
while (j > 0 && arr[j - 1] > arr[j]) {
 tmp = arr[j];
 arr[j] = arr[j - 1];
 arr[j - 1] = tmp;
 j--;
}

can be a for loop, as explained:

 j = i; // Initialization
 while (j > 0 && arr[j - 1] > arr[j]) { // Condition
 tmp = arr[j];
 arr[j] = arr[j - 1];
 arr[j - 1] = tmp;
 j--; // Decrement
 }

which turns into:

for (j = i; j > 0 && arr[j - 1] > arr[j]; j--) {
 tmp = arr[j];
 arr[j] = arr[j - 1];
 arr[j - 1] = tmp;
}

Now, why is the first line not necessary, and even discouraged?

Well, because you should declare variable right where you need them, and not before:

void insertion_sort(int arr[], int length)
{
 for (int i = 1; i < length; i++) {
 for (int j = i; j > 0 && arr[j - 1] > arr[j]; j--) {
 int tmp = arr[j];
 arr[j] = arr[j - 1];
 arr[j - 1] = tmp;
 }
 }
}

Question 6

With C++ conventions I guess you mean the Google style doc? Having braces on the same line of the statement or not is an opinion, no need to address that.

Question 7

I guess you meant: int tmp = arr[j]; inside the loop there?

Question 8

@CompuChip Good catch, will fix.

Question 9

Even int i, j, tmp; is considered bad practice. One variable per line please and naming them better than i and j would have been better help.

Question 10

@Deduplicator: Nope. One letter variables are archaic hang over from Fortran and C. They go against self documenting code. They also make maintenance a pain the arse. Try finding all occurrences of the variable i in a function that has any comments. Creating named variables costs you nothing and in the long run makes the code easier to read (no down side) while short variable names have plenty of downsides (maintenance).

Question 11

This is not C++, yet. Just using std::cout does not C++ make. But do not fret, we'll get there.

Safe interface

Whenever designing an API (or interface), you should strive to make it as safe as possible.

Your interface here is not safe: having two un-linked variables to correctly pass is asking for troubles; it is just too easy to mix the variables up and pass a size that does not correspond to the array to be sorted (or printed).

Obviously, in your toy example it does not show up, however as soon as you manipulate multiple arrays in the same function you'll be very likely to make this mistake. And even with a single array, passing the length before re-allocating (for example) is also likely.

So, you need to aim for an interface that conveys both pieces of information in a single argument. An obvious choice is std::vector, another would be some instance of array_view or a random_access_range but those are not Standard so let's not bother for now.

Thus, the interface becomes:

void insertion_sort(std::vector<int>& array);
void print(std::vector<int> const& array);

Do note the consistency in the naming; both times the argument is named array instead of being once arr and once array. Consistency may be the hobgobelin of little minds, but it does help readers.

Algorithmic

An insertion sort is generally described simply as: read each element in turn, and insert it in the right position among the previous read (and sorted) elements.

The cost of the algorithm is thus:

for each of the N elements
the cost of inserting it in the i-1 (where i is the index of the element being read) first elements

The outer loop we cannot do anything about, but what of the insertion?

In your code, the insertion is done by:

 while (j > 0 && arr[j - 1] > arr[j]) {
 tmp = arr[j];
 arr[j] = arr[j - 1];
 arr[j - 1] = tmp;
 j--;
 }

That is, the value is bubbling up toward the front until it finds its spot.

This is suboptimal in terms of comparisons: the optimal algorithm to find the position to insert is using a binary search rather than a linear search, for O(log N) complexity rather than O(N) one.

In terms of number of moves, well, you may have to move all elements, so it could only be improved by a constant factor anyway.

Standing on the shoulders of giants, we can improve the code by separating the search and the move, and use pre-existing algorithms to do so:

std::upper_bound yields a pointer to the first element in the range that is greater than the value we search with
std::rotate rotates the elements of the range (and middle point) passed to it, such that the range delimited by the last two elements come to precede the range delimited by the first two

We can use them keeping to indices, although it is slightly verbose:

void insertion_sort(std::vector<int>& array) {
 typedef std::vector<int>::iterator It;
 for (size_t i = 1, length = array.size(); i < length; ++i) {
 // 1. Search
 It const end_sorted = array.begin() + i;
 It const insertion_point =
 std::upper_bound(array.begin(), end_sorted, array[i]);
 // 2. Insert
 std::rotate(insertion_point, end_sorted, end_sorted + 1);
 }
}

Thus I advise jumping the gun and just switch to iterators since that it what the interfaces of the algorithms we use consume:

void insertion_sort(std::vector<int>& array) {
 typedef std::vector<int>::iterator It;
 for (It it = array.begin(), end = array.end(); it != end; ++it) {
 // 1. Search
 It const insertion_point =
 std::upper_bound(array.begin(), it, *it);
 // 2. Insert
 std::rotate(insertion_point, it, it + 1);
 }
}

Which in modern C++ would use auto rather than explicitly naming the iterator type:

void insertion_sort(std::vector<int>& array) {
 for (auto it = array.begin(), end = array.end(); it != end; ++it) {
 // 1. Search
 auto const insertion_point =
 std::upper_bound(array.begin(), it, *it);
 // 2. Insert
 std::rotate(insertion_point, it, it + 1);
 }
}

Note that abbreviating further is not a good idea; while one-liners may give the writer a fuzzy warm feeling, they just end up being head scratchers for maintainers:

// Puzzle time!
void insertion_sort(std::vector<int>& array) {
 for (auto it = array.begin(), end = array.end(); it != end; ++it) {
 std::rotate(std::upper_bound(array.begin(), it, *it), it, it + 1);
 }
}

What does endl do?

Finally, a trick question: what do think std::endl is and does?

std::endl is a so called stream manipulator, it is a function whose signature is:

template< class CharT, class Traits >
std::basic_ostream<CharT, Traits>& endl( std::basic_ostream<CharT, Traits>& os );

So, when used on a stream, it does an operation on said stream and returns it (to allow further chaining).

Specifically, ostream << std::endl is equivalent to:

ostream << '\n';
ostream.flush();

Where flush makes sure that the data is not buffered in the process but fully sent to the Operating System (it does not guarantee that the data is on disk, sent over the network, or anything else).

There is very little reason to call flush in general:

in the absence of guarantee that the data arrived on disk (or wherever), it serves not practical purpose
the buffer was introduced for performance reasons, bypassing it is asking to go slow

So, not only should you generally avoid flush, std::endl is a mouthful. Just write '\n' to get an end-of-line character and be done with it.

Thus, we rewrite print:

void print(std::vector<int> const& array) {
 for (auto it = array.begin(), end = array.end(); it != end; ++i) {
 std::cout << *it << '\n';
 }
}

Or, using the range-for notation of modern C++:

void print(std::vector<int> const& array) {
 for (int e: array) { std::cout << e << '\n'; }
}

Who said C++ could not be nifty?

Question 12

You could use auto and the extended for loop in your insertion sort implementation. Other than that, spotless.

Question 13

For your print(), I would prefer std::copy with an iterator adaptor for writing to a stream to offer flexibility. Not everything is going to be printed per line (std::ostream_iterator<>), sometimes you just want a delimited list (std::experiemental::ostream_joiner).

Question 14

Reducing number of writes

Right now, you swap neighboring elements until the newest element reaches its proper place. This takes 2n array writes to move an element n spots. Instead of swapping, you can shift array elements up a position, and then write the newest element into its correct place. This takes n array writes, which cuts in half the number of writes you need to make.

Here is what the code would look like:

void insertion_sort(int arr[], int length)
{
 int i,j;
 for (i = 1; i < length; i++) {
 int tmp = arr[i];
 for (j = i; j > 0 && arr[j - 1] > tmp; j--) {
 arr[j] = arr[j - 1];
 }
 arr[j] = tmp;
 }
}

Question 15

I would suggest making use of the standard library; as it is now, you don't use it at all. I would also suggest using some binary search to find the place a new element should be shifted to; this will reduce the number of comparisons from O(n^2) to O(n log n), but it will still need O(n^2) memory operations.

Use template! and iterators! Templates allow you to use the same function with almost any type. You certainly don't want to use something as ugly as pointers and length parameters.

Here's an implementation in C++11

#include <algorithm>
#include <iterator>
template<class I, class C = std::less<class std::iterator_traits<I>::value_type>>
void insertion_sort(I begin, I end) {
 for (auto i = begin; i != end; ++i) {
 auto index = std::upper_bound(begin, i, *i, C());
 std::rotate(index, i, i + 1);
 }
}

You can use std::upper_bound because you know that for [begin,i) the elements are sorted (that's the part you have already sorted).

You should use std::rotate to do the actual memory operations; it does exactly what we want and could even be used for further optimizations.

The comparator C default to what is essentially < (std::less). By adding it, you can change it if you wish -- you might want to sort it in reverse and all you would have to do, is replace it with > (std::greater). You could optimize this a bit further by using a better upper-bound function (one that's O(1) in the best case, instead of always being O(log n)). It's also possible to reduce the number of memory operations to something that approaches O(n log n) but this requires you to look for an appropriate block between [i, end) that fits in [index,index+1).

Edits: used std::rotate and added configurable comparator as suggested by Snowhawk04

Question 16

You should also accept a comparator in your insertion_sort to pass to std::upper_bound. std::rotate could be used instead of your own handwritten for.

Question 17

Of course! Usually I don't bother but I should make a habit of actually doing this right... always. And yes, the names are horrible.

Question 18

In general, put variables in the narrowest scope possible. For example, tmp is only used inside the innermost loop, so it should be defined there rather than at the entry to the function. This is made easier by C++'s declaration in a for loop:

void insertion_sort(int array[], int length) {
 for (int i = 1; i < length; ++i) {
 for (int j = i; j > 0 && array[j - 1] > array[j]; --j) {
 int tmp = array[j];
 array[j] = array[j - 1];
 array[j - 1] = tmp;
 }
 }
}

Even better would be using std::swap instead of those three lines in the innermost loop:

void insertion_sort(int array[], int length) {
 for (int i = 1; i < length; ++i) {
 for (int j = i; j > 0 && array[j - 1] > array[j]; --j) {
 std::swap(array[j - 1], array[j]);
 }
 }
}

In main, don't use the magic number 10; instead, let the compiler figure out the size of the array.

int main() {
 int array[] = { 4, 6, 3, 7, 5, 9, 2, 8, 1, 10 };
 const int size = sizeof(array) / sizeof(array[0]);
 insertion_sort(array, size);
 print_array(array, size);
 return 0;
}

Better yet, use std::vector<int> instead of a C-style array; vector carries its length around, so you don't risk getting the size wrong:

void insertion_sort(std::vector<int>& array) {
 for (int i = 1; i < array.size(); ++i) {
 for (int j = i; j > 0 && array[j - 1] > array[j]; --j) {
 std::swap(array[j - 1], array[j]);
 }
 }
}
int main() {
 std::vector<int> array = { 4, 6, 3, 7, 5, 9, 2, 8, 1, 10 };
 insertion_sort(array);
 print_array(array);
 return 0;
}

Question 19

Extending the answer from @Clearer, which doesn't use the templated custom comparison class. Here is how I modified it to use it:

#include <algorithm>
#include <iterator>
template<class I, class C = std::less<typename std::iterator_traits<I>::value_type>>
void insertion_sort(I begin, I end, C comp = C()) {
 for (auto i = begin; i != end; ++i) {
 auto index = std::upper_bound(begin, i, *i, comp);
 std::rotate(index, i, i + 1);
 }
}

score 8 · Accepted Answer · 2015-11-15 02:31:53Z

A couple of things:

int i,j,tmp;

can be formatted better:

int i, j, tmp;

Even that would be discouraged, in two ways:

Declare variables where you need them, not before.
Multiple declarations on one line isn't very readable.

and actually isn't even necessary, as is explained later.

int j;
for (j = 0; j < size; j++)
{
 std::cout << array[j] << std::endl;
}

C++ conventions have braces on the same line as the statement:

int j;
for (j = 0; j < size; j++) {
 std::cout << array[j] << std::endl;
}

In fact, the first line can be merged into the loop:

for (int j = 0; j < size; j++) {
 std::cout << array[j] << std::endl;
}

j = i;
while (j > 0 && arr[j - 1] > arr[j]) {
 tmp = arr[j];
 arr[j] = arr[j - 1];
 arr[j - 1] = tmp;
 j--;
}

can be a for loop, as explained:

 j = i; // Initialization
 while (j > 0 && arr[j - 1] > arr[j]) { // Condition
 tmp = arr[j];
 arr[j] = arr[j - 1];
 arr[j - 1] = tmp;
 j--; // Decrement
 }

which turns into:

for (j = i; j > 0 && arr[j - 1] > arr[j]; j--) {
 tmp = arr[j];
 arr[j] = arr[j - 1];
 arr[j - 1] = tmp;
}

Now, why is the first line not necessary, and even discouraged?

Well, because you should declare variable right where you need them, and not before:

void insertion_sort(int arr[], int length)
{
 for (int i = 1; i < length; i++) {
 for (int j = i; j > 0 && arr[j - 1] > arr[j]; j--) {
 int tmp = arr[j];
 arr[j] = arr[j - 1];
 arr[j - 1] = tmp;
 }
 }
}

With C++ conventions I guess you mean the Google style doc? Having braces on the same line of the statement or not is an opinion, no need to address that.
I guess you meant: int tmp = arr[j]; inside the loop there?
Even int i, j, tmp; is considered bad practice. One variable per line please and naming them better than i and j would have been better help.
@Deduplicator: Nope. One letter variables are archaic hang over from Fortran and C. They go against self documenting code. They also make maintenance a pain the arse. Try finding all occurrences of the variable i in a function that has any comments. Creating named variables costs you nothing and in the long run makes the code easier to read (no down side) while short variable names have plenty of downsides (maintenance).

Stack Exchange Network

Insertion Sort in C++

6 Answers 6

Reducing number of writes

Your Answer

Sign up or log in

Post as a guest

Post as a guest

Linked

Hot Network Questions

Insertion Sort in C++

6 Answers 6

Reducing number of writes

Your Answer

Sign up or log in

Post as a guest

Post as a guest

Linked

Related

Hot Network Questions