As my parallel mergesort implementation was very memory dependent, I wanted to write a parallel quicksort one. Here it comes:
template<class In>
In partition(In b, In e) {
// create uniform distribuiton for random engine
std::uniform_int_distribution<int> rand_distribution(0, std::distance(b, e - 1));
// call static random engine to get index of random pivot
tbb::spin_mutex::scoped_lock lock;
lock.acquire(mutex);
auto rand_pivot_index = rand_distribution(rand_engine);
lock.release();
// put pivot to last place in range
std::swap(*(b + rand_pivot_index), *(e - 1));
// save pivot value
auto pivot = *(e - 1);
auto pivotiterator = b;
// sort the range
for(auto it = b; it != e - 1; ++it) {
if(*it < pivot) {
std::swap(*it, *pivotiterator);
++pivotiterator;
}
}
// put pivot to its according position and return position
std::swap(*(pivotiterator), *(e - 1));
return pivotiterator;
}
template<class In>
void quick_sort_parallel(In b, In e) {
if (b != e) {
In m = partition(b, e);
// switch to different sort on worst case or smaller ranges
if(std::distance(b, m) > 500) {
tbb::parallel_invoke([&] { quick_sort_parallel(b, m); },
[&] { quick_sort_parallel(m + 1, e); });
}
else
merge_sort_parallel(b, e); //defined somewhere else, pretty standard
}
}
2 Answers 2
Code
I would suggest using RAII for your lock.
Instead of:
std::uniform_int_distribution<int> rand_distribution(0, std::distance(b, e - 1));
// call static random engine to get index of random pivot
tbb::spin_mutex::scoped_lock lock;
lock.acquire(mutex);
auto rand_pivot_index = rand_distribution(rand_engine);
lock.release();
// put pivot to last place in range
std::swap(*(b + rand_pivot_index), *(e - 1));
Use:
std::uniform_int_distribution<int> rand_distribution(0, std::distance(b, e - 1));
// call static random engine to get index of random pivot
int rand_pivot_index;
{
tbb::spin_mutex::scoped_lock lock(mutex);
rand_pivot_index = rand_distribution(rand_engine);
}
// put pivot to last place in range
std::swap(*(b + rand_pivot_index), *(e - 1));
This ensures that if rand_distribution throws, the lock is released correctly.
Algorithm
Choice of pivot is important in any quick-sort algorithm, and even more so in the parallel case, when you want to garuntee a good distribution of load over cores by avoiding unbalanced recursion.
A good basic strategy is to choose the median of k random elements as your pivot.
Taking this further, using a median of medians algorithm can guarantee your quicksort a O(n log n) running time.
The correct approach and value of k will largely depend on the inputs your function will expect to take (e.g. adapt k based on size on input).
-
\$\begingroup\$ Of course RAII is the way to manage locks. Today I would do it all a little bit different anyway. Concerning the median of median thing, I don't understand how you can guarantee O(n log n) on an array of only dupes. \$\endgroup\$inf– inf2012年04月14日 07:54:23 +00:00Commented Apr 14, 2012 at 7:54
-
\$\begingroup\$ If your input can have non distinct elements then you also need your partition step to split into three. 1: (< pivot), 2: (= pivot), 3: (> pivot). Then recurse on 1 and 3. This will get you to O(n log n). \$\endgroup\$cmh– cmh2012年04月14日 09:22:45 +00:00Commented Apr 14, 2012 at 9:22
-
\$\begingroup\$ from where
rand_engine
come from? \$\endgroup\$MORTAL– MORTAL2014年12月18日 13:50:13 +00:00Commented Dec 18, 2014 at 13:50 -
\$\begingroup\$ @MORTAL from the original question - it's an unimportant implementation detail. \$\endgroup\$cmh– cmh2014年12月18日 17:27:46 +00:00Commented Dec 18, 2014 at 17:27
Actually, how your pseudo-random number generator is declared does matter, for depending on how it is declared, you may avoid using a lock to generate random numbers. Generating a random number is alreadu an expensive operation, so locking them makes it really expensive.
If you make your pseudo-random number generator thread_local
, you don't need to lock anything anymore before generating a random number, it will be inherently thread-safe:
thread_local std::random_device rd;
thread_local std::mt19937 gen(rd());
Moreover, if you also make thread_local
your std::uniform_int_distribution
, you won't have to update it every time you call partition
, but only once in each thread. That said, you can't initialize it with its "range" anymore, you have to pass it when you call it:
// Initialize it once
thread_local std::uniform_int_distribution<int> rand_distribution{};
// Call it
using param_type = std::uniform_int_distribution<int>::param_type;
auto rand_pivot_index = rand_distribution(gen, {0, std::distance(b, e - 1)});
If I didn't make any error while writing the code (untested), this should be thread-safe, lock-free and faster than your original solution :)
-
\$\begingroup\$ That's a good idea these days. Guess
thread_local
wasn't even supported back in 2012. \$\endgroup\$inf– inf2016年04月06日 14:00:18 +00:00Commented Apr 6, 2016 at 14:00 -
\$\begingroup\$ @inf Oh yeah, the question is becoming a bit old. I know that GCC and CLang have supported an almost equivalent
__thread
for a while, but I guess it took a bit longer to MSVC :) \$\endgroup\$Morwenn– Morwenn2016年04月06日 14:01:41 +00:00Commented Apr 6, 2016 at 14:01
Explore related questions
See similar questions with these tags.