I am implementing an application in Haskell and, for sorting, I use the library function Data.List.sort
. However, I was wondering whether this is the fastest sort implementation in the Haskell standard library (perhaps lists are not the best choice for efficient sorting).
I have found different alternatives, e.g. heap sort on arrays, sort on sequences (but the documentation does not say what kind of algorithm is used).
My question is: what is the fastest sorting implementation (container type + sort function) provided by the Haskell standard library? Is there some documentation page listing all library sort functions and comparing them wrt performance?
EDIT
To provide some more context, I am running a benchmark. I have written a simple program in C, Java, Python and Haskell that
- Reads 1000000 strings (lines) from a text file.
- Sorts the strings using a built-in (library) sorting algorithm.
- Writes the sorted list of strings to a file.
For each implementation, I only measure the sorting time (leaving out the time needed for disk IO). Running the benchmark on Ubuntu 12.04, I get
- C (gcc 4.6.3,
qsort
onchar **
): 0.890 s - Java (OpenJDK 64-Bit 1.7.0_09,
Collections.sort()
onjava.util.LinkedList<String>
): 1.307 s - Python (Python 2.7.3,
list.sort()
): 1.072 s - Haskell (GHC 7.4.1,
Data.List.sort
on[Data.ByteString.UTF8.ByteString]
): 11.864 s
So I wonder if there is another data type / library function in Haskell that can give better performance.
-
1You might avoid some overhead if you use unboxed arrays, they tend to be nice and quickdaniel gratzer– daniel gratzer2012年12月23日 04:52:01 +00:00Commented Dec 23, 2012 at 4:52
-
Could you post here (or to some pastebin) your testing programs, so that we can work with them?Petr– Petr2012年12月23日 12:13:41 +00:00Commented Dec 23, 2012 at 12:13
-
@Petr Pudlák: All of them? Or would the Haskell version be sufficient? It is a project I am working on and I would prefer to post only the parts relevant to sorting in Haskell.Giorgio– Giorgio2012年12月23日 12:40:29 +00:00Commented Dec 23, 2012 at 12:40
-
1There is no one sort to rule them all. Different sorts will be best for different kinds of data.whatsisname– whatsisname2012年12月23日 18:11:49 +00:00Commented Dec 23, 2012 at 18:11
-
1@whatsisname: And what would you use for sorting strings? I do not need to rule them all, just to sort strings as fast as possible.Giorgio– Giorgio2012年12月23日 18:24:39 +00:00Commented Dec 23, 2012 at 18:24
1 Answer 1
The problem with Data.List.sort
is that it uses merge sort, which creates new lists during each pass. So a lot of time is spent on allocating and freeing memory.
Surprisingly, AFAIK there isn't a sorting library for mutable arrays, which are likely to be faster. So I tried to make one and put it on github: marray-sort. It needs rigorous testing, polishing and optimizing too, but so far it already seems to be significantly faster than Data.List.sort
.
If you make any experiments with it, I'd be happy to see the results. I put your (slightly modified) benchmark to src-test/Test.hs for convenience. Don't forget to compile everything with -O2
to trigger the necessary compiler optimizations.
Edit: I found out now that there is an implementation if introsort for mutable vectors in vector-algorithms. According to my measurements, it is slightly faster (5-10%) than my attempt above for MArray
s
See also: How does one sort with Data.Vector.Generic.Mutable?.
-
Thanks a lot (+1). I have also written a quick sort (and merge sort) for mutable arrays. I am eager to look at your solution (even though it will have to wait until the coming week end).Giorgio– Giorgio2013年01月14日 08:57:37 +00:00Commented Jan 14, 2013 at 8:57
-
Sorry that it took me so long to answer. I am working on this project in my free time and I kept postponing it. I have now tried out your implementation and it is definitely faster than the
Data.List.Sort
. I also have implemented my own merge sort on mutable arrays and it runs a bit faster than introsort, at least on my examples. I haven't triedData.Vector.Algorithms.Intro.sort
yet.Giorgio– Giorgio2014年01月12日 22:19:53 +00:00Commented Jan 12, 2014 at 22:19