As my parallel mergesort implementation parallel mergesort implementation was very memory dependent, I wanted to write a parallel quicksort one. Here it comes:
template<class In>
In partition(In b, In e) {
// create uniform distribuiton for random engine
std::uniform_int_distribution<int> rand_distribution(0, std::distance(b, e - 1));
// call static random engine to get index of random pivot
tbb::spin_mutex::scoped_lock lock;
lock.acquire(mutex);
auto rand_pivot_index = rand_distribution(rand_engine);
lock.release();
// put pivot to last place in range
std::swap(*(b + rand_pivot_index), *(e - 1));
// save pivot value
auto pivot = *(e - 1);
auto pivotiterator = b;
// sort the range
for(auto it = b; it != e - 1; ++it) {
if(*it < pivot) {
std::swap(*it, *pivotiterator);
++pivotiterator;
}
}
// put pivot to its according position and return position
std::swap(*(pivotiterator), *(e - 1));
return pivotiterator;
}
template<class In>
void quick_sort_parallel(In b, In e) {
if (b != e) {
In m = partition(b, e);
// switch to different sort on worst case or smaller ranges
if(std::distance(b, m) > 500) {
tbb::parallel_invoke([&] { quick_sort_parallel(b, m); },
[&] { quick_sort_parallel(m + 1, e); });
}
else
merge_sort_parallel(b, e); //defined somewhere else, pretty standard
}
}
As my parallel mergesort implementation was very memory dependent, I wanted to write a parallel quicksort one. Here it comes:
template<class In>
In partition(In b, In e) {
// create uniform distribuiton for random engine
std::uniform_int_distribution<int> rand_distribution(0, std::distance(b, e - 1));
// call static random engine to get index of random pivot
tbb::spin_mutex::scoped_lock lock;
lock.acquire(mutex);
auto rand_pivot_index = rand_distribution(rand_engine);
lock.release();
// put pivot to last place in range
std::swap(*(b + rand_pivot_index), *(e - 1));
// save pivot value
auto pivot = *(e - 1);
auto pivotiterator = b;
// sort the range
for(auto it = b; it != e - 1; ++it) {
if(*it < pivot) {
std::swap(*it, *pivotiterator);
++pivotiterator;
}
}
// put pivot to its according position and return position
std::swap(*(pivotiterator), *(e - 1));
return pivotiterator;
}
template<class In>
void quick_sort_parallel(In b, In e) {
if (b != e) {
In m = partition(b, e);
// switch to different sort on worst case or smaller ranges
if(std::distance(b, m) > 500) {
tbb::parallel_invoke([&] { quick_sort_parallel(b, m); },
[&] { quick_sort_parallel(m + 1, e); });
}
else
merge_sort_parallel(b, e); //defined somewhere else, pretty standard
}
}
As my parallel mergesort implementation was very memory dependent, I wanted to write a parallel quicksort one. Here it comes:
template<class In>
In partition(In b, In e) {
// create uniform distribuiton for random engine
std::uniform_int_distribution<int> rand_distribution(0, std::distance(b, e - 1));
// call static random engine to get index of random pivot
tbb::spin_mutex::scoped_lock lock;
lock.acquire(mutex);
auto rand_pivot_index = rand_distribution(rand_engine);
lock.release();
// put pivot to last place in range
std::swap(*(b + rand_pivot_index), *(e - 1));
// save pivot value
auto pivot = *(e - 1);
auto pivotiterator = b;
// sort the range
for(auto it = b; it != e - 1; ++it) {
if(*it < pivot) {
std::swap(*it, *pivotiterator);
++pivotiterator;
}
}
// put pivot to its according position and return position
std::swap(*(pivotiterator), *(e - 1));
return pivotiterator;
}
template<class In>
void quick_sort_parallel(In b, In e) {
if (b != e) {
In m = partition(b, e);
// switch to different sort on worst case or smaller ranges
if(std::distance(b, m) > 500) {
tbb::parallel_invoke([&] { quick_sort_parallel(b, m); },
[&] { quick_sort_parallel(m + 1, e); });
}
else
merge_sort_parallel(b, e); //defined somewhere else, pretty standard
}
}
As my parallel mergesort implementation was very memory dependent, I wanted to write a parallel quicksort one.
The algorithm itself is working fine in serial mode, however when I go parallel things are going strange.
I am creating the elements which I am gonna sort with a random distribution. The strange thing is that when I set a high mean and sigma value for the distribution - resulting in a lot of different elements - the algorithm runs fine in parallel with a good almost linear speed up. However, when I set a low mean and sigma, which gives a lot of similar values, I get a segmentation fault. One thing to note is that Here it is very consistentcomes: on a lot of same values I do always get a seg. fault and vice versa.
As my parallel mergesort implementation was very memory dependent, I wanted to write a parallel quicksort one.
The algorithm itself is working fine in serial mode, however when I go parallel things are going strange.
I am creating the elements which I am gonna sort with a random distribution. The strange thing is that when I set a high mean and sigma value for the distribution - resulting in a lot of different elements - the algorithm runs fine in parallel with a good almost linear speed up. However, when I set a low mean and sigma, which gives a lot of similar values, I get a segmentation fault. One thing to note is that it is very consistent: on a lot of same values I do always get a seg. fault and vice versa.
As my parallel mergesort implementation was very memory dependent, I wanted to write a parallel quicksort one. Here it comes:
UPDATE: It was like I expected, the seg. fault was indeed a stack overflow. It did only occur when recursion depth was quite high which was caused by the worst-case behaviour of quick-sort, due to the small range of different elements in the array to be sorted. I changed the code to switch to a different sort algorithm which doesn't have the unbalanced recursion and also avoids the \$O(n^2)\$ worst case behaviour - merge sort in my example. It's more a hybridsort now, but whatever.
UPDATE: It was like I expected, the seg. fault was indeed a stack overflow. It did only occur when recursion depth was quite high which was caused by the worst-case behaviour of quick-sort, due to the small range of different elements in the array to be sorted. I changed the code to switch to a different sort algorithm which doesn't have the unbalanced recursion and also avoids the \$O(n^2)\$ worst case behaviour - merge sort in my example. It's more a hybridsort now, but whatever.