I am confronted with a problem where I have a massive list of information (287,843 items) that must be sorted for display. Which is more efficient, to use a self-organizing red-black binary tree to keep them sorted or to build an array and then sort? My keys are strings, if that helps. This algorithm should make use of multiple processor cores.
Thank you!
-
1Are you going to be adding to items dynamically?Sergey Kalinichenko– Sergey Kalinichenko2012年01月29日 23:09:39 +00:00Commented Jan 29, 2012 at 23:09
-
Make sure, if the best solution is not using some complete relational database system.zch– zch2012年01月29日 23:18:19 +00:00Commented Jan 29, 2012 at 23:18
-
1Am I the only on who does not think that 287,843 strings is huge? Sorting only indexes or pointers can probably be done within a second on a single core. Why the need for multi-processor? Homework?wildplasser– wildplasser2012年01月29日 23:34:08 +00:00Commented Jan 29, 2012 at 23:34
-
@wildplasser the need for multiprocessor is homework.Nathan Moos– Nathan Moos2012年02月02日 19:07:02 +00:00Commented Feb 2, 2012 at 19:07
-
@wildplasser I have been experimenting with other methods of speeding this up and it is really a contrived problem. The sort executes very quickly on one core of a Intel Core 2 Duo at 1800MHz.Nathan Moos– Nathan Moos2012年02月28日 18:54:13 +00:00Commented Feb 28, 2012 at 18:54
2 Answers 2
This really depends on the particulars of your setup. If you have a multicore machine, you can probably sort the strings extremely quickly by using a parallel version of quicksort, in which each recursive call is executed in parallel with each other call. With many cores, this can take the already fast quicksort and make it substantially faster. Other sorting algorithms like merge sort can also be parallelized, though parallel quicksort has the advantage of requiring less extra memory. Since you know that you're sorting strings, you may also want to look into parallel radix sort, which could potentially be extremely fast.
Most binary search trees cannot easily be multithreaded, because rebalance operations often require changing multiple parts of the tree at once, so a balanced red/black tree may not be the best approach here. However, you may want to look into a concurrent skiplist, which is a data structure that can be made to work efficiently in parallel. There are some newer binary search trees designed for parallelism that sometimes outperform the skiplist (here is one such data structure), though I expect that there will be fewer existing implementations and discussion of these newer structures.
If the elements are not changing frequently or you only need sorted order once, then just sorting once with parallel quicksort is probably the best bet. If the elements are changing frequently, then a concurrent data structure like the parallel skiplist will probably be a better bet.
Hope this helps!
Assuming that you're reading that list from a file or some other data source, it seems quite right to read all that into an array, and then sort it. If you have a GUI of some sort, it seems even more feasible to do both reading and sorting in a thread, while having the GUI in a "waiting to complete" state. Keeping a tree of the values sounds feasible only if you're going to do a lot of deletions/insertions, which would make an array less usable in this case.
When it comes to multi-core sorting, I believe the merge sort is the easiest to parallelize. But I'm no expert when it comes to this, so don't take my word for a definite answer.
Explore related questions
See similar questions with these tags.