template<class Compare, class Iterator>
void merge_sort(Iterator start, Iterator fin, int sort_type) {
Compare comp;
typedef typename iterator_traits<Iterator>::value_type _value_type; //black type magic to infer data type
if (distance(start, fin) > 1) {
vector<_value_type> left (start, start+distance(start, fin)/2);
vector<_value_type> right (start+distance(start, fin)/2, fin);
merge_sort<Compare>(left.begin(), left.end(), 2);
merge_sort<Compare>(right.begin(), right.end(), 2);
auto i = left.begin();
auto j = right.begin();
while (i!= left.end() or j!= right.end()) {
if (i == left.end()) {
*start++ = *j++;
} else if (j == right.end()) {
*start++ = *i++;
} else if (comp(*i, *j)) {
*start++ = *i++;
} else {
*start++ = *j++;
}
}
}
So, I've written this implementation of merge_sort
, but it seems to be quite slow --- it took 1500ms to sort a vector of 1 000 000 random int
s, while standard qsort
did the same in less than a tenth of that --- 130ms. Is there (and there definitely is) something wrong with my code, and how can I fix it so that it's more effective?
UPDATE: So, umm, I've updated the code and is should use only one auxiliary vector. The speed did not improve much, though. Anything else?
template<class Compare, class Iterator>
void merge_sort(Iterator start, Iterator fin, int sort_type) {
Compare comp;
typedef typename iterator_traits<Iterator>::value_type _value_type; //black type magic to infer data type
if (distance(start, fin) > 1) {
static vector<_value_type> temp (distance(start, fin), 0);
auto i = start;
auto j = start + distance(start, fin)/2;
auto k = temp.begin();
merge_sort<Compare>(i, j, 2);
merge_sort<Compare>(j, fin, 2);
while (i != start+distance(start, fin)/2 or j != fin) {
if (i == start+distance(start, fin)/2) {
*(k++) = *(j++);
} else if (j == fin){
*(k++) = *(i++);
} else if (comp(*i, *j)) {
*(k++) = *(i++);
} else {
*(k++) = *(j++);
}
}
copy(temp.begin(), temp.begin()+distance(start, fin), start);
}
UPDATE 1.5 It's a miracle this even works; it definitely shouldn't. I'll fix this later.
-
1\$\begingroup\$ Your code makes too many allocations and deallocations. For a million-item array the recursion goes about 20 levels deep, on each level you allocate and deallocate two new vectors. On the deepest level you allocate a million (yes, million!) one-item vectors. Luckily, not all at the same time; anyway you do it. And you do about two millions allocations in total. Twice the length of an array being sorted. Like almost all, who try to implement the merge sort... \$\endgroup\$CiaPan– CiaPan2016年01月29日 11:31:08 +00:00Commented Jan 29, 2016 at 11:31
-
\$\begingroup\$ @CiaPan That's why bottom up merge sort is better IMO. You can easily reuse the temp array you need for the merge. \$\endgroup\$ratchet freak– ratchet freak2016年01月29日 11:35:57 +00:00Commented Jan 29, 2016 at 11:35
-
\$\begingroup\$ @ratchetfreak You can do that in both approaches, actually. Just fill the temp array with a copy of data and merge alternatively from appropriate part of one to the other, switching the direction on each level of recursion. Thanks to the initial copying, the source is ready to use on the deepest level, despite a direction which it works. \$\endgroup\$CiaPan– CiaPan2016年01月29日 11:40:47 +00:00Commented Jan 29, 2016 at 11:40
-
\$\begingroup\$ This example shows, how good algorithms, tailored for specific data structures (sequential file or list, in this case), may degenerate when simply implemented on a too high, abstract level (iterator over array). \$\endgroup\$CiaPan– CiaPan2016年01月29日 11:46:43 +00:00Commented Jan 29, 2016 at 11:46
-
\$\begingroup\$ IMO it also shows how recursion may look elegant it can still limit the implementation in unexpected ways. \$\endgroup\$ratchet freak– ratchet freak2016年01月29日 11:49:09 +00:00Commented Jan 29, 2016 at 11:49
2 Answers 2
Don't allocate a vector in each recursive call
You just need one copy of the whole vector as a parameter, then use both as the auxiliary interchangeably at each recursive call.
Example in Java so you can see what I mean (Taken from this source here ):
private static void sort(Comparable[] a, Comparable[] aux, int lo, int hi)
{
if (hi <= lo) return;
int mid = lo + (hi - lo) / 2;
sort (aux, a,lo, mid);
sort (aux, a, mid+1, hi);
merge(a, aux, lo, mid, hi);
}
Use insertion sort as base case
You can use insertion sort as a base case for a given threshold (Use some benchmarks to find it. Insertion sort is faster for smaller inputs.
-
\$\begingroup\$ But shouldn't
static
take care of reallocation? Doesn't that mean thattemp
will be created only once? \$\endgroup\$Akiiino– Akiiino2016年01月30日 05:00:00 +00:00Commented Jan 30, 2016 at 5:00
In a merge sort (and quick sort) you need to stop your recursion when there is a handful of iterations left (say 32 elements) and then use an in-place sort (some even use bubblesort shrugs) to sort the last elements in that range. The reason is that the recursion and splitting overhead becomes too large when the number of elements are small.