12
\$\begingroup\$

As my parallel mergesort implementation was very memory dependent, I wanted to write a parallel quicksort one. Here it comes:

template<class In>
In partition(In b, In e) {
 // create uniform distribuiton for random engine
 std::uniform_int_distribution<int> rand_distribution(0, std::distance(b, e - 1));
 // call static random engine to get index of random pivot
 tbb::spin_mutex::scoped_lock lock;
 lock.acquire(mutex);
 auto rand_pivot_index = rand_distribution(rand_engine);
 lock.release();
 // put pivot to last place in range
 std::swap(*(b + rand_pivot_index), *(e - 1));
 // save pivot value
 auto pivot = *(e - 1);
 auto pivotiterator = b;
 // sort the range
 for(auto it = b; it != e - 1; ++it) {
 if(*it < pivot) {
 std::swap(*it, *pivotiterator);
 ++pivotiterator;
 }
 }
 // put pivot to its according position and return position
 std::swap(*(pivotiterator), *(e - 1));
 return pivotiterator;
}
template<class In>
void quick_sort_parallel(In b, In e) {
 if (b != e) {
 In m = partition(b, e);
 // switch to different sort on worst case or smaller ranges
 if(std::distance(b, m) > 500) {
 tbb::parallel_invoke([&] { quick_sort_parallel(b, m); }, 
 [&] { quick_sort_parallel(m + 1, e); });
 }
 else 
 merge_sort_parallel(b, e); //defined somewhere else, pretty standard
 }
}
asked Jan 16, 2012 at 17:56
\$\endgroup\$
0

2 Answers 2

6
\$\begingroup\$

Code

I would suggest using RAII for your lock.

Instead of:

std::uniform_int_distribution<int> rand_distribution(0, std::distance(b, e - 1));
// call static random engine to get index of random pivot
tbb::spin_mutex::scoped_lock lock;
lock.acquire(mutex);
auto rand_pivot_index = rand_distribution(rand_engine);
lock.release();
// put pivot to last place in range
std::swap(*(b + rand_pivot_index), *(e - 1));

Use:

std::uniform_int_distribution<int> rand_distribution(0, std::distance(b, e - 1));
// call static random engine to get index of random pivot
int rand_pivot_index;
{
 tbb::spin_mutex::scoped_lock lock(mutex);
 rand_pivot_index = rand_distribution(rand_engine);
}
// put pivot to last place in range
std::swap(*(b + rand_pivot_index), *(e - 1));

This ensures that if rand_distribution throws, the lock is released correctly.

Algorithm

Choice of pivot is important in any quick-sort algorithm, and even more so in the parallel case, when you want to garuntee a good distribution of load over cores by avoiding unbalanced recursion.

A good basic strategy is to choose the median of k random elements as your pivot.

Taking this further, using a median of medians algorithm can guarantee your quicksort a O(n log n) running time.

The correct approach and value of k will largely depend on the inputs your function will expect to take (e.g. adapt k based on size on input).

answered Apr 13, 2012 at 13:12
\$\endgroup\$
4
  • \$\begingroup\$ Of course RAII is the way to manage locks. Today I would do it all a little bit different anyway. Concerning the median of median thing, I don't understand how you can guarantee O(n log n) on an array of only dupes. \$\endgroup\$ Commented Apr 14, 2012 at 7:54
  • \$\begingroup\$ If your input can have non distinct elements then you also need your partition step to split into three. 1: (< pivot), 2: (= pivot), 3: (> pivot). Then recurse on 1 and 3. This will get you to O(n log n). \$\endgroup\$ Commented Apr 14, 2012 at 9:22
  • \$\begingroup\$ from where rand_engine come from? \$\endgroup\$ Commented Dec 18, 2014 at 13:50
  • \$\begingroup\$ @MORTAL from the original question - it's an unimportant implementation detail. \$\endgroup\$ Commented Dec 18, 2014 at 17:27
2
\$\begingroup\$

Actually, how your pseudo-random number generator is declared does matter, for depending on how it is declared, you may avoid using a lock to generate random numbers. Generating a random number is alreadu an expensive operation, so locking them makes it really expensive.

If you make your pseudo-random number generator thread_local, you don't need to lock anything anymore before generating a random number, it will be inherently thread-safe:

thread_local std::random_device rd;
thread_local std::mt19937 gen(rd());

Moreover, if you also make thread_local your std::uniform_int_distribution, you won't have to update it every time you call partition, but only once in each thread. That said, you can't initialize it with its "range" anymore, you have to pass it when you call it:

// Initialize it once
thread_local std::uniform_int_distribution<int> rand_distribution{};
// Call it
using param_type = std::uniform_int_distribution<int>::param_type;
auto rand_pivot_index = rand_distribution(gen, {0, std::distance(b, e - 1)});

If I didn't make any error while writing the code (untested), this should be thread-safe, lock-free and faster than your original solution :)

answered Apr 6, 2016 at 13:56
\$\endgroup\$
2
  • \$\begingroup\$ That's a good idea these days. Guess thread_local wasn't even supported back in 2012. \$\endgroup\$ Commented Apr 6, 2016 at 14:00
  • \$\begingroup\$ @inf Oh yeah, the question is becoming a bit old. I know that GCC and CLang have supported an almost equivalent __thread for a while, but I guess it took a bit longer to MSVC :) \$\endgroup\$ Commented Apr 6, 2016 at 14:01

Your Answer

Draft saved
Draft discarded

Sign up or log in

Sign up using Google
Sign up using Email and Password

Post as a guest

Required, but never shown

Post as a guest

Required, but never shown

By clicking "Post Your Answer", you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.