I am learning programming online and currently watching CS50 lectures, in third lecture, professor introduces us to merge_sort though only through pseudo code, so I wanted someone to review my implementation and tell me where I can improve and what I possibly missed.
#include <algorithm>
#include <assert.h>
#include <cmath>
#include <iostream>
#include <vector>
template <typename T> void merge_sort(T m_array[], const size_t size) {
using namespace std;
if (size < 2)
return;
else if (size == 2) {
if (m_array[0] > m_array[1])
std::swap(m_array[0], m_array[1]); // no need for a new array
return;
}
merge_sort(m_array, size / 2);
merge_sort(m_array + size / 2,
std::ceil(static_cast<double>(size) /
2) // for odd size, divison will have truncated value
// and recursion will left the the last array value
// for sorting, size%2==0?size/2:size/2+1
);
T *new_array = new T[size];
size_t na_pos = 0;
const T *array1 = m_array, *array2 = m_array + size / 2;
const T *const array1end = m_array + size / 2, *const array2end =
m_array + size;
while (1) {
if (array1 == array1end)
break;
if (array2 == array2end)
break;
if (*array1 < *array2)
new_array[na_pos++] = *array1++;
else if (*array2 < *array1)
new_array[na_pos++] = *array2++;
else if (*array1 == *array2) {
new_array[na_pos++] = *array1++;
new_array[na_pos++] = *array2++;
}
}
assert(array1 == array1end || array2 == array2end);
while (array1 != array1end)
new_array[na_pos++] = *array1++;
while (array2 != array2end)
new_array[na_pos++] = *array2++;
assert(na_pos == size);
for (na_pos = 0; na_pos < size; na_pos++)
m_array[na_pos] = new_array[na_pos];
delete[] new_array;
}
int main() {
using namespace std;
auto limit = 10000;
const auto size = 21;
while (limit--) {
vector<int> a;
a.reserve(size);
while (a.size() < size)
a.push_back(rand() % 10000);
merge_sort(a.data(), a.size());
assert(std::is_sorted(a.begin(), a.end()));
}
return 0;
}
2 Answers 2
std::ceil(static_cast<double>(size) / 2
is a very long (and unclear) way to saysize - size/2
.No naked loops please. Each loop represents an important algorithm, and deserves a name. For example,
while (array1 != array1end) new_array[na_pos++] = *array1++;
wants to be
copy(new_array + na_pos, array1, array1_end);
, and you may reuse it at least 2 more times (and don't make itvoid
: it has something interesting to return).Similarly, the merging loop represents a
merge
algorithm.The block
if (*array1 < *array2) new_array[na_pos++] = *array1++; else if (*array2 < *array1) new_array[na_pos++] = *array2++; else if (*array1 == *array2) { new_array[na_pos++] = *array1++; new_array[na_pos++] = *array2++; }
introduces a subtle problem. Here your algorithm may lose stability. If
array1
has two elements comparing equal, sayx1,x2
, andarray2
has an elementx3
which also compares equal to them, the final arrangement will bex1, x3, x2
, whereas the stable sort should result inx1, x2, x3
.The problem is easy to fix:
if (*array1 <= *array2) { new_array[na_pos++] = *array1++; } else { new_array[na_pos++] = *array2++; }
I strongly recommend to put the loop termination conditions where they belong:
while ((array1 != array1end) && (array2 != array2end))
-
\$\begingroup\$ third point and it's fix took me some time to grasp, may be it's my English, but yeah very good catch, never caught that myself \$\endgroup\$bluedragon– bluedragon2018年06月14日 06:00:05 +00:00Commented Jun 14, 2018 at 6:00
-
1\$\begingroup\$ @bluedragon I should've said don't copy from
array2
until you sure. English is not my first language either. \$\endgroup\$vnp– vnp2018年06月14日 06:02:21 +00:00Commented Jun 14, 2018 at 6:02 -
\$\begingroup\$ if i use auto lastNewAPos = std::copy(array1,array1end,new_array + na_pos); std::copy(array2,array2end,lastNewAPos); it's two seconds slower for 1000000 arrays \$\endgroup\$bluedragon– bluedragon2018年06月14日 06:15:32 +00:00Commented Jun 14, 2018 at 6:15
Don’t write using namespace std;
.
At least you didn’t do it in global scope; but you’ll have the same problems when a future version of the library adds names that cause ambiguities.
template <typename T>
void merge_sort(T m_array[], const size_t size) {
⧺I.13 Do not pass an array as a single pointer.
If you really need to pass raw memory (only), use gsl::span
. But this is a template: you should take begin/end iterators to any container.
T *new_array = new T[size];
⧺C.149 — no naked new
or delete
.
You should probably make this a unique_ptr
as a drop-in replacement without otherwise changing the architecture.
a.push_back(rand() % 10000);
rand
is pretty bad. Get used to using the proper random number facilities in the standard library. You want a uniform_int_distribution
.