I wanted to write a multi-threaded sort, but unfortunately I don't know much about threads, especially in C++11. I managed to make something work, but I would be highly surprised if it was the best way to do it.
template<class T>
void ___sort(T *data, int len, int nbThreads){
if(nbThreads<2){
std::sort(data, (&(data[len])), std::less<T>());
}
else{
std::future<void> b = std::async(std::launch::async,___sort<T>, data, len/2, nbThreads-2);
std::future<void> a = std::async(std::launch::async,___sort<T>, &data[len/2], len/2, nbThreads-2);
a.wait();
b.wait();
std::inplace_merge (data,&data[len/2],&data[len],std::less<T>());
}
}
Some of the tests I did sorting integers:
size is the number of ints sorted, and the time is in seconds.
size : 2097152 time with 1 thread : 4.961 time with 2 thread : 3.191 time with 4 thread : 2.377 size : 4194304 time with 1 thread : 10.002 time with 2 thread : 6.214 time with 4 thread : 4.689 size : 8388608 time with 1 thread : 19.975 time with 2 thread : 12.332 time with 4 thread : 9.29 size : 16777216 time with 1 thread : 38.712 time with 2 thread : 25.257 time with 4 thread : 18.706
Also, I tried using std::qsort instead of std::sort, and the results were at least twice as long as that. Any reasons why?
-
3\$\begingroup\$ Don't use '__' in an identifer name.stackoverflow.com/a/228797/14065 \$\endgroup\$Loki Astari– Loki Astari2013年02月17日 20:53:17 +00:00Commented Feb 17, 2013 at 20:53
-
\$\begingroup\$ I think that one additional optimisation could be to avoid "false sharing" of cache line between threads. The way the buffer is divided this could happen with the current code.. \$\endgroup\$renoX– renoX2013年02月26日 15:09:04 +00:00Commented Feb 26, 2013 at 15:09
1 Answer 1
Something like this is probably more efficient:
template<class T>
void parallel_sort(T* data, int len, int grainsize)
{
// Use grainsize instead of thread count so that we don't e.g.
// spawn 4 threads just to sort 8 elements.
if(len < grainsize)
{
std::sort(data, data + len, std::less<T>());
}
else
{
auto future = std::async(parallel_sort<T>, data, len/2, grainsize);
// No need to spawn another thread just to block the calling
// thread which would do nothing.
parallel_sort(data + len/2, len - len/2, grainsize);
future.wait();
std::inplace_merge(data, data + len/2, data + len, std::less<T>());
}
}
You could set grain size to something like, std::max(256, len/nb_threads).
-
\$\begingroup\$ Yes I don't want 4 threads to sort 8 items. But then again I don't want to spawn more threads than I have parallelism available (not more ((1.0 -> 1.5) * <available cores>)). So I would use a combination of these two techniques. \$\endgroup\$Loki Astari– Loki Astari2013年02月17日 19:40:19 +00:00Commented Feb 17, 2013 at 19:40
-
\$\begingroup\$ @LokiAstari: See my suggestion below the code, set grainsize to e.g.
std::max(256, len/nb_threads): \$\endgroup\$ronag– ronag2013年02月17日 19:50:26 +00:00Commented Feb 17, 2013 at 19:50 -
4\$\begingroup\$ parallel_sort(data + len/2, len/2, grainsize); should be: parallel_sort(data + len/2, len - len/2, grainsize); \$\endgroup\$user106202– user1062022016年05月24日 10:47:01 +00:00Commented May 24, 2016 at 10:47