Efficiently sorting many strings in parallel for presentation

Question 1

I am confronted with a problem where I have a massive list of information (287,843 items) that must be sorted for display. Which is more efficient, to use a self-organizing red-black binary tree to keep them sorted or to build an array and then sort? My keys are strings, if that helps. This algorithm should make use of multiple processor cores.

Thank you!

Question 2

Are you going to be adding to items dynamically?

Question 3

Make sure, if the best solution is not using some complete relational database system.

Question 4

Am I the only on who does not think that 287,843 strings is huge? Sorting only indexes or pointers can probably be done within a second on a single core. Why the need for multi-processor? Homework?

Question 5

@wildplasser the need for multiprocessor is homework.

Question 6

@wildplasser I have been experimenting with other methods of speeding this up and it is really a contrived problem. The sort executes very quickly on one core of a Intel Core 2 Duo at 1800MHz.

Question 7

This really depends on the particulars of your setup. If you have a multicore machine, you can probably sort the strings extremely quickly by using a parallel version of quicksort, in which each recursive call is executed in parallel with each other call. With many cores, this can take the already fast quicksort and make it substantially faster. Other sorting algorithms like merge sort can also be parallelized, though parallel quicksort has the advantage of requiring less extra memory. Since you know that you're sorting strings, you may also want to look into parallel radix sort, which could potentially be extremely fast.

Most binary search trees cannot easily be multithreaded, because rebalance operations often require changing multiple parts of the tree at once, so a balanced red/black tree may not be the best approach here. However, you may want to look into a concurrent skiplist, which is a data structure that can be made to work efficiently in parallel. There are some newer binary search trees designed for parallelism that sometimes outperform the skiplist (here is one such data structure), though I expect that there will be fewer existing implementations and discussion of these newer structures.

If the elements are not changing frequently or you only need sorted order once, then just sorting once with parallel quicksort is probably the best bet. If the elements are changing frequently, then a concurrent data structure like the parallel skiplist will probably be a better bet.

Hope this helps!

Question 8

Assuming that you're reading that list from a file or some other data source, it seems quite right to read all that into an array, and then sort it. If you have a GUI of some sort, it seems even more feasible to do both reading and sorting in a thread, while having the GUI in a "waiting to complete" state. Keeping a tree of the values sounds feasible only if you're going to do a lot of deletions/insertions, which would make an array less usable in this case.

When it comes to multi-core sorting, I believe the merge sort is the easiest to parallelize. But I'm no expert when it comes to this, so don't take my word for a definite answer.

score 6 · Accepted Answer · 2012-01-29 23:11:44Z

This really depends on the particulars of your setup. If you have a multicore machine, you can probably sort the strings extremely quickly by using a parallel version of quicksort, in which each recursive call is executed in parallel with each other call. With many cores, this can take the already fast quicksort and make it substantially faster. Other sorting algorithms like merge sort can also be parallelized, though parallel quicksort has the advantage of requiring less extra memory. Since you know that you're sorting strings, you may also want to look into parallel radix sort, which could potentially be extremely fast.

Most binary search trees cannot easily be multithreaded, because rebalance operations often require changing multiple parts of the tree at once, so a balanced red/black tree may not be the best approach here. However, you may want to look into a concurrent skiplist, which is a data structure that can be made to work efficiently in parallel. There are some newer binary search trees designed for parallelism that sometimes outperform the skiplist (here is one such data structure), though I expect that there will be fewer existing implementations and discussion of these newer structures.

If the elements are not changing frequently or you only need sorted order once, then just sorting once with parallel quicksort is probably the best bet. If the elements are changing frequently, then a concurrent data structure like the parallel skiplist will probably be a better bet.

Hope this helps!

CollectivesTM on Stack Overflow

Efficiently sorting many strings in parallel for presentation

2 Answers 2

Your Answer

Sign up or log in

Post as a guest

Post as a guest

Hot Network Questions

CollectivesTM on Stack Overflow

2 Answers 2

Your Answer

Sign up or log in

Post as a guest

Post as a guest

Related