Merge Sort in C++

Question 1

I've written this code for a merge sort, which is meant to implement the pseudo-code from Cormen's Introduction to Algorithms:

#include <iostream>
#include <cstdlib>
using namespace std;
const unsigned long long infinity = -1ULL;
void merge(int* A,int p,const int q, const int r)
{
 const int n_1=q-p+1;
 const int n_2=r-q;
 int* L = new int [n_1+1];
 int* R = new int [n_2+1];
 L[n_1]=infinity;
 R[n_2]=infinity;
 for(int i = 0; i < n_1; i++) 
 L[i] = A[p+i];
 for (int j = 0; j < n_2; j++)
 R[j] = A[q+j+1];
 int i=0;
 int j=0;
 // for(int k = p; k <= r; p++) broken code
 int k;
 for(k=p; k <= r && i < n_1 && j < n_2; ++k)
 {
 if(L[i] <= R[j])
 {
 A[k] = L[i];
 i++;
 }
 else
 {
 A[k] = R[j];
 j++; 
 }
 }
 // Added the following two loop.
 // Note only zero or one loop will actually activate.
 while (i < n_1) {A[k++] = L[i++];}
 while (j < n_2) {A[k++] = R[j++];}
} 
void merge_sort(int* A, const int p, const int r)
{
 if (p < r) 
 {
 int q = (p+r)/2;
 merge_sort(A, p,q);
 merge_sort(A,q+1,r);
 merge(A,p,q,r);
 }
}
int main()
{
 int length;
 cout << "Specify array length" << endl;
 cin >> length;
 cout << "\n";
 int A [length];
 //Populate and print the Array
 for(int i = 0; i < length; i++)
 {
 A[i] = rand()%99-1;
 cout << A[i] << " ";
 } 
 cout << "\n";
 merge_sort(A,0,length-1);
 cout << "Your array has been merge_sorted and is now this: " << endl;
 for(int i = 0; i < length; i++) cout << A[i] << " ";
 cout << "\n";
 //cout << infinity << endl;
 return 0;
}

Question 2

Did you try to run this using gdb or another debugger and check whether the indeces used are correct?

Question 3

Couple of issues. You have used C++ (the language) to write C code. Also this is not a good site to ask questions about why something is broken (check out SO for that). This site is meanly for critiquing style.

Question 4

Note: It actually makes things easier in C++ to specify ranges as not inclusive [0,n) rather than inclusive [0,n-1] in C++. You will see this a lot in the standard algorithms.

Question 5

PPS. It is usually a good idea to provide sample input that causes the issue. Tracking down the problem can be very hard when you also have to work out the data that causes the described behavior.

Question 6

Your bug is here: for(int k = p; k <= r; p++). It should be: for(int k = p; k <= r && i < n_1 && j < n_2; k++). Notice TWO Mistakes. 1) If i or j get to the end of their section then comparing them against the other section is pointless and can lead to reading beyond the end of the array. 2) You are incrementing p when you should be incrementing k.

Question 7

Don't do this:

using namespace std;

For a detailed explanation on why not to use the usign clause.

This is not used anywhere (also its not infinity so baddy named).

const unsigned long long infinity = -1ULL;

Passing pointers is a not a good idea.

void merge(int* A,int p,const int q, const int r)

Try and avoid it because you get into the realm of ownership symantics. Here I would be OK with using it but I would definitely rename the variable to explain what it is. You have a tendency to use excessively short variable names; this make the code hard to read.

If you use the standard practive of begin and end (used in the STL (so begin points at the first and end points one past the last)) then you will find that your code becomes a lot simpler to write (especially in this case).

 const int n_1=q-p+1;
 const int n_2=r-q;

This is stupendously bad for C++ code. This looks like C code.

 int* L = new int [n_1+1];
 int* R = new int [n_2+1];

You should practically never use new. When you do you should always match it with delete (I see no delete anywhere so your code leaks). The reason you never use new is the problem with matching it with delete (especially when exceptions can be flying around). What you want to do is use a container:

 std::vector<int> left(n_1+1);
 std::vector<int> right(n_2+1);

You only use new and delete when building your own container types or in very low level code. Normally you will use existing containers std::vector or smart pointers (for these use make_{smartpointertype}.

Has no affect:

 L[n_1]=infinity;
 R[n_2]=infinity;

Especially since you never check for infinity. Also because you always know the exect bounds and should not fall off the end.

Also this is a particularly bad implementation of the algorithm.
Most version I have seen use a split/merge in place algorithm. Thus you don't need to copy data around into new arrays all over the place.

 for(int i = 0; i < n_1; i++) 
 L[i] = A[p+i];
 for (int j = 0; j < n_2; j++)
 R[j] = A[q+j+1];
 for(k=p; k <= r && i < n_1 && j < n_2; ++k)
 {
 if(L[i] <= R[j])
 {
 A[k] = L[i];
 i++;
 }
 else
 {
 A[k] = R[j];
 j++; 
 }
 }

Question 8

"This is not used anywhere (also its not infinity so baddy named)." -- Well, I followed the logic of Cormen as best I could, and understood this particular point as "use a really large number". But yes, it isn't actually used. Admittedly several pages later an exercise is proposed to modify the algorithm so that it wouldn't need to be used. "This is not used anywhere (also its not infinity so baddy named)." --true, but his is generally a pseudo-code to real C++ code exercise, so all the names were clear for me. But I see how your advice would be relevant in the real world.

Question 9

"This looks like C code." -- it is. I wanted to re-write pseudo-code into a simple and fast code. Perhaps I should re-write the C code now into something more like C++. "I see no delete anywhere so your code leaks " --true. I've added them to my original code some time after posting the question, though. So, all things considered, thanks a lot for your answer, now I see many points I have to work on.

Question 10

Read about using, but my response is similar to the third one there: as long as I do it in my private code sources, it wouldn't matter much. But perhaps a habit of adding std:: where needed is worth being formed early.

Question 11

I agree with forming a habit of using std::. But, if you really must use it, then keep it local (inside a function) instead of global.

Question 12

@Chiffa: <quote>as long as I do it in my private code sources, it wouldn't matter much</quote>. You already nearly broke your own code with this tiny simple example. std::merge() is now in the same namespace as your ::merge(). You are one parameter off hitting std::merge(). If your code is any larger than this you are going to start hitting issues that then become hard to debug. Unless your code is shorter than 20 lines you should never use using namespace XX;. It will hit you and debugging it will be a nightmare.

Loki Astari Loki Astari 97.7k5 gold badges126 silver badges341 bronze badges · Answer 1 · 2013-09-27 00:01:03Z

Don't do this:

using namespace std;

For a detailed explanation on why not to use the usign clause.

This is not used anywhere (also its not infinity so baddy named).

const unsigned long long infinity = -1ULL;

Passing pointers is a not a good idea.

void merge(int* A,int p,const int q, const int r)

Try and avoid it because you get into the realm of ownership symantics. Here I would be OK with using it but I would definitely rename the variable to explain what it is. You have a tendency to use excessively short variable names; this make the code hard to read.

If you use the standard practive of begin and end (used in the STL (so begin points at the first and end points one past the last)) then you will find that your code becomes a lot simpler to write (especially in this case).

 const int n_1=q-p+1;
 const int n_2=r-q;

This is stupendously bad for C++ code. This looks like C code.

 int* L = new int [n_1+1];
 int* R = new int [n_2+1];

You should practically never use new. When you do you should always match it with delete (I see no delete anywhere so your code leaks). The reason you never use new is the problem with matching it with delete (especially when exceptions can be flying around). What you want to do is use a container:

 std::vector<int> left(n_1+1);
 std::vector<int> right(n_2+1);

You only use new and delete when building your own container types or in very low level code. Normally you will use existing containers std::vector or smart pointers (for these use make_{smartpointertype}.

Has no affect:

 L[n_1]=infinity;
 R[n_2]=infinity;

Especially since you never check for infinity. Also because you always know the exect bounds and should not fall off the end.

Also this is a particularly bad implementation of the algorithm.
Most version I have seen use a split/merge in place algorithm. Thus you don't need to copy data around into new arrays all over the place.

 for(int i = 0; i < n_1; i++) 
 L[i] = A[p+i];
 for (int j = 0; j < n_2; j++)
 R[j] = A[q+j+1];
 for(k=p; k <= r && i < n_1 && j < n_2; ++k)
 {
 if(L[i] <= R[j])
 {
 A[k] = L[i];
 i++;
 }
 else
 {
 A[k] = R[j];
 j++; 
 }
 }

"This is not used anywhere (also its not infinity so baddy named)." -- Well, I followed the logic of Cormen as best I could, and understood this particular point as "use a really large number". But yes, it isn't actually used. Admittedly several pages later an exercise is proposed to modify the algorithm so that it wouldn't need to be used. "This is not used anywhere (also its not infinity so baddy named)." --true, but his is generally a pseudo-code to real C++ code exercise, so all the names were clear for me. But I see how your advice would be relevant in the real world.
"This looks like C code." -- it is. I wanted to re-write pseudo-code into a simple and fast code. Perhaps I should re-write the C code now into something more like C++. "I see no delete anywhere so your code leaks " --true. I've added them to my original code some time after posting the question, though. So, all things considered, thanks a lot for your answer, now I see many points I have to work on.
Read about using, but my response is similar to the third one there: as long as I do it in my private code sources, it wouldn't matter much. But perhaps a habit of adding std:: where needed is worth being formed early.
I agree with forming a habit of using std::. But, if you really must use it, then keep it local (inside a function) instead of global.
@Chiffa: <quote>as long as I do it in my private code sources, it wouldn't matter much</quote>. You already nearly broke your own code with this tiny simple example. std::merge() is now in the same namespace as your ::merge(). You are one parameter off hitting std::merge(). If your code is any larger than this you are going to start hitting issues that then become hard to debug. Unless your code is shorter than 20 lines you should never use using namespace XX;. It will hit you and debugging it will be a nightmare.

Stack Exchange Network

Merge Sort in C++

1 Answer 1

Your Answer

Sign up or log in

Post as a guest

Post as a guest

Linked

Hot Network Questions

Merge Sort in C++

1 Answer 1

Your Answer

Sign up or log in

Post as a guest

Post as a guest

Linked

Related

Hot Network Questions