Writing Merge Sort in C++

Question 1

I am learning programming online and currently watching CS50 lectures, in third lecture, professor introduces us to merge_sort though only through pseudo code, so I wanted someone to review my implementation and tell me where I can improve and what I possibly missed.

#include <algorithm>
#include <assert.h>
#include <cmath>
#include <iostream>
#include <vector>
template <typename T> void merge_sort(T m_array[], const size_t size) {
 using namespace std;
 if (size < 2)
 return;
 else if (size == 2) {
 if (m_array[0] > m_array[1])
 std::swap(m_array[0], m_array[1]); // no need for a new array
 return;
 }
 merge_sort(m_array, size / 2);
 merge_sort(m_array + size / 2,
 std::ceil(static_cast<double>(size) /
 2) // for odd size, divison will have truncated value
 // and recursion will left the the last array value
 // for sorting, size%2==0?size/2:size/2+1
 );
 T *new_array = new T[size];
 size_t na_pos = 0;
 const T *array1 = m_array, *array2 = m_array + size / 2;
 const T *const array1end = m_array + size / 2, *const array2end =
 m_array + size;
 while (1) {
 if (array1 == array1end)
 break;
 if (array2 == array2end)
 break;
 if (*array1 < *array2)
 new_array[na_pos++] = *array1++;
 else if (*array2 < *array1)
 new_array[na_pos++] = *array2++;
 else if (*array1 == *array2) {
 new_array[na_pos++] = *array1++;
 new_array[na_pos++] = *array2++;
 }
 }
 assert(array1 == array1end || array2 == array2end);
 while (array1 != array1end)
 new_array[na_pos++] = *array1++;
 while (array2 != array2end)
 new_array[na_pos++] = *array2++;
 assert(na_pos == size);
 for (na_pos = 0; na_pos < size; na_pos++)
 m_array[na_pos] = new_array[na_pos];
 delete[] new_array;
}
int main() {
 using namespace std;
 auto limit = 10000;
 const auto size = 21;
 while (limit--) {
 vector<int> a;
 a.reserve(size);
 while (a.size() < size)
 a.push_back(rand() % 10000);
 merge_sort(a.data(), a.size());
 assert(std::is_sorted(a.begin(), a.end()));
 }
 return 0;
}

Question 2

std::ceil(static_cast<double>(size) / 2 is a very long (and unclear) way to say size - size/2.
No naked loops please. Each loop represents an important algorithm, and deserves a name. For example,
```
while (array1 != array1end)
 new_array[na_pos++] = *array1++;
```
wants to be copy(new_array + na_pos, array1, array1_end);, and you may reuse it at least 2 more times (and don't make it void: it has something interesting to return).

Similarly, the merging loop represents a merge algorithm.

The block

if (*array1 < *array2)
 new_array[na_pos++] = *array1++;
else if (*array2 < *array1)
 new_array[na_pos++] = *array2++;
else if (*array1 == *array2) {
 new_array[na_pos++] = *array1++;
 new_array[na_pos++] = *array2++;
}

introduces a subtle problem. Here your algorithm may lose stability. If array1 has two elements comparing equal, say x1,x2, and array2 has an element x3 which also compares equal to them, the final arrangement will be x1, x3, x2, whereas the stable sort should result in x1, x2, x3.

The problem is easy to fix:

if (*array1 <= *array2) {
 new_array[na_pos++] = *array1++;
} else {
 new_array[na_pos++] = *array2++;
}

I strongly recommend to put the loop termination conditions where they belong:
```
while ((array1 != array1end) && (array2 != array2end))
```

Question 3

third point and it's fix took me some time to grasp, may be it's my English, but yeah very good catch, never caught that myself

Question 4

@bluedragon I should've said don't copy from array2 until you sure. English is not my first language either.

Question 5

if i use auto lastNewAPos = std::copy(array1,array1end,new_array + na_pos); std::copy(array2,array2end,lastNewAPos); it's two seconds slower for 1000000 arrays

Question 6

Don’t write `using namespace std;`.

At least you didn’t do it in global scope; but you’ll have the same problems when a future version of the library adds names that cause ambiguities.

template <typename T> 
void merge_sort(T m_array[], const size_t size) {

⧺I.13 Do not pass an array as a single pointer.

If you really need to pass raw memory (only), use gsl::span. But this is a template: you should take begin/end iterators to any container.

T *new_array = new T[size];

⧺C.149 — no naked new or delete.

You should probably make this a unique_ptr as a drop-in replacement without otherwise changing the architecture.

 a.push_back(rand() % 10000);

rand is pretty bad. Get used to using the proper random number facilities in the standard library. You want a uniform_int_distribution.

vnp vnp 58.6k4 gold badges55 silver badges144 bronze badges · Accepted Answer · 2018-06-14 05:51:25Z

std::ceil(static_cast<double>(size) / 2 is a very long (and unclear) way to say size - size/2.
No naked loops please. Each loop represents an important algorithm, and deserves a name. For example,
```
while (array1 != array1end)
 new_array[na_pos++] = *array1++;
```
wants to be copy(new_array + na_pos, array1, array1_end);, and you may reuse it at least 2 more times (and don't make it void: it has something interesting to return).

Similarly, the merging loop represents a merge algorithm.

The block

if (*array1 < *array2)
 new_array[na_pos++] = *array1++;
else if (*array2 < *array1)
 new_array[na_pos++] = *array2++;
else if (*array1 == *array2) {
 new_array[na_pos++] = *array1++;
 new_array[na_pos++] = *array2++;
}

introduces a subtle problem. Here your algorithm may lose stability. If array1 has two elements comparing equal, say x1,x2, and array2 has an element x3 which also compares equal to them, the final arrangement will be x1, x3, x2, whereas the stable sort should result in x1, x2, x3.

The problem is easy to fix:

if (*array1 <= *array2) {
 new_array[na_pos++] = *array1++;
} else {
 new_array[na_pos++] = *array2++;
}

I strongly recommend to put the loop termination conditions where they belong:
```
while ((array1 != array1end) && (array2 != array2end))
```

third point and it's fix took me some time to grasp, may be it's my English, but yeah very good catch, never caught that myself
@bluedragon I should've said don't copy from array2 until you sure. English is not my first language either.
if i use auto lastNewAPos = std::copy(array1,array1end,new_array + na_pos); std::copy(array2,array2end,lastNewAPos); it's two seconds slower for 1000000 arrays

Stack Exchange Network

Writing Merge Sort in C++

2 Answers 2

Don’t write `using namespace std;`.

Your Answer

Sign up or log in

Post as a guest

Post as a guest

Hot Network Questions

Writing Merge Sort in C++

2 Answers 2

Don’t write using namespace std;.

Your Answer

Sign up or log in

Post as a guest

Post as a guest

Related

Hot Network Questions

Don’t write `using namespace std;`.