Return to Question

replaced http://codereview.stackexchange.com/ with https://codereview.stackexchange.com/

edited Apr 13, 2017 at 12:40

As my parallel mergesort implementation parallel mergesort implementation was very memory dependent, I wanted to write a parallel quicksort one. Here it comes:

template<class In>
In partition(In b, In e) {
 // create uniform distribuiton for random engine
 std::uniform_int_distribution<int> rand_distribution(0, std::distance(b, e - 1));
 
 // call static random engine to get index of random pivot
 tbb::spin_mutex::scoped_lock lock;
 lock.acquire(mutex);
 auto rand_pivot_index = rand_distribution(rand_engine);
 lock.release();
 
 // put pivot to last place in range
 std::swap(*(b + rand_pivot_index), *(e - 1));
 
 // save pivot value
 auto pivot = *(e - 1);
 auto pivotiterator = b;
 
 // sort the range
 for(auto it = b; it != e - 1; ++it) {
 if(*it < pivot) {
 std::swap(*it, *pivotiterator);
 ++pivotiterator;
 }
 }
 
 // put pivot to its according position and return position
 std::swap(*(pivotiterator), *(e - 1));
 return pivotiterator;
}
template<class In>
void quick_sort_parallel(In b, In e) {
 if (b != e) {
 In m = partition(b, e);
 
 // switch to different sort on worst case or smaller ranges
 if(std::distance(b, m) > 500) {
 tbb::parallel_invoke([&] { quick_sort_parallel(b, m); }, 
 [&] { quick_sort_parallel(m + 1, e); });
 }
 else 
 merge_sort_parallel(b, e); //defined somewhere else, pretty standard
 }
}

As my parallel mergesort implementation was very memory dependent, I wanted to write a parallel quicksort one. Here it comes:

template<class In>
In partition(In b, In e) {
 // create uniform distribuiton for random engine
 std::uniform_int_distribution<int> rand_distribution(0, std::distance(b, e - 1));
 
 // call static random engine to get index of random pivot
 tbb::spin_mutex::scoped_lock lock;
 lock.acquire(mutex);
 auto rand_pivot_index = rand_distribution(rand_engine);
 lock.release();
 
 // put pivot to last place in range
 std::swap(*(b + rand_pivot_index), *(e - 1));
 
 // save pivot value
 auto pivot = *(e - 1);
 auto pivotiterator = b;
 
 // sort the range
 for(auto it = b; it != e - 1; ++it) {
 if(*it < pivot) {
 std::swap(*it, *pivotiterator);
 ++pivotiterator;
 }
 }
 
 // put pivot to its according position and return position
 std::swap(*(pivotiterator), *(e - 1));
 return pivotiterator;
}
template<class In>
void quick_sort_parallel(In b, In e) {
 if (b != e) {
 In m = partition(b, e);
 
 // switch to different sort on worst case or smaller ranges
 if(std::distance(b, m) > 500) {
 tbb::parallel_invoke([&] { quick_sort_parallel(b, m); }, 
 [&] { quick_sort_parallel(m + 1, e); });
 }
 else 
 merge_sort_parallel(b, e); //defined somewhere else, pretty standard
 }
}

As my parallel mergesort implementation was very memory dependent, I wanted to write a parallel quicksort one. Here it comes:

template<class In>
In partition(In b, In e) {
 // create uniform distribuiton for random engine
 std::uniform_int_distribution<int> rand_distribution(0, std::distance(b, e - 1));
 
 // call static random engine to get index of random pivot
 tbb::spin_mutex::scoped_lock lock;
 lock.acquire(mutex);
 auto rand_pivot_index = rand_distribution(rand_engine);
 lock.release();
 
 // put pivot to last place in range
 std::swap(*(b + rand_pivot_index), *(e - 1));
 
 // save pivot value
 auto pivot = *(e - 1);
 auto pivotiterator = b;
 
 // sort the range
 for(auto it = b; it != e - 1; ++it) {
 if(*it < pivot) {
 std::swap(*it, *pivotiterator);
 ++pivotiterator;
 }
 }
 
 // put pivot to its according position and return position
 std::swap(*(pivotiterator), *(e - 1));
 return pivotiterator;
}
template<class In>
void quick_sort_parallel(In b, In e) {
 if (b != e) {
 In m = partition(b, e);
 
 // switch to different sort on worst case or smaller ranges
 if(std::distance(b, m) > 500) {
 tbb::parallel_invoke([&] { quick_sort_parallel(b, m); }, 
 [&] { quick_sort_parallel(m + 1, e); });
 }
 else 
 merge_sort_parallel(b, e); //defined somewhere else, pretty standard
 }
}

Unified post after mod removal frenzy

Source Link

edited Dec 18, 2014 at 18:16

inf

edited Dec 18, 2014 at 18:16

inf

As my parallel mergesort implementation was very memory dependent, I wanted to write a parallel quicksort one.

The algorithm itself is working fine in serial mode, however when I go parallel things are going strange.

I am creating the elements which I am gonna sort with a random distribution. The strange thing is that when I set a high mean and sigma value for the distribution - resulting in a lot of different elements - the algorithm runs fine in parallel with a good almost linear speed up. However, when I set a low mean and sigma, which gives a lot of similar values, I get a segmentation fault. One thing to note is that Here it is very consistentcomes: on a lot of same values I do always get a seg. fault and vice versa.

As my parallel mergesort implementation was very memory dependent, I wanted to write a parallel quicksort one.

The algorithm itself is working fine in serial mode, however when I go parallel things are going strange.

I am creating the elements which I am gonna sort with a random distribution. The strange thing is that when I set a high mean and sigma value for the distribution - resulting in a lot of different elements - the algorithm runs fine in parallel with a good almost linear speed up. However, when I set a low mean and sigma, which gives a lot of similar values, I get a segmentation fault. One thing to note is that it is very consistent: on a lot of same values I do always get a seg. fault and vice versa.

As my parallel mergesort implementation was very memory dependent, I wanted to write a parallel quicksort one. Here it comes:

No need for an update, especially before an answer

Source Link

edited Dec 18, 2014 at 16:51

Jamal

edited Dec 18, 2014 at 16:51

Jamal

35.2k
13
134
238

UPDATE: It was like I expected, the seg. fault was indeed a stack overflow. It did only occur when recursion depth was quite high which was caused by the worst-case behaviour of quick-sort, due to the small range of different elements in the array to be sorted. I changed the code to switch to a different sort algorithm which doesn't have the unbalanced recursion and also avoids the \$O(n^2)\$ worst case behaviour - merge sort in my example. It's more a hybridsort now, but whatever.