Although this sort works on Int lists < length million, when the stack overflows, the last clause of the merge function definition looks odd to me, recursing in 2 places. Is there clearer way to define that definition and how can the stack limitation be avoided?
msort :: Ord a => [a] -> [a]
msort [] = []
msort xs = concat . merge . split $ xs
merge :: Ord a => [[a]] -> [[a]]
merge [] = []
merge [x] = [x]
merge (l1:l2:ls) = merge $ mpair l1 l2 : merge ls
mpair :: Ord a => [a] -> [a] -> [a]
mpair [] l2 = l2
mpair l1 [] = l1
mpair l1@(x:xs) l2@(y:ys) | x >= y = x : mpair xs l2
| otherwise = y : mpair l1 ys
split :: [a] -> [[a]]
split xs = [[x]| x<-xs]
2 Answers 2
You've spotted the problem. Your merge
function is flawed. For each pair it processes, it introduces an extra unnecessary call to merge. That is, for a list like [1,2,3,4,5,6]
, instead of your first merge
call expanding directly to:
merge [[1],[2],[3],[4],[5],[6]]
= merge $ mpair [1] [2] : mpair [3] [4] : mpair [5] [6] : []
it expands to:
merge [[1],[2],[3],[4],[5],[6]]
= merge $ mpair [1] [2] : merge $ mpair [3] [4]
: merge $ mpair [5] [6] : merge []
As a result, your count of merge
calls is O(n)
and your count of mpair
calls is O(n^2)
(or similar -- I didn't check exactly). when they should be O(log n)
and O(n log n)
respectively.
Instead, you want to split merge
up into two functions:
merge :: Ord a => [[a]] -> [[a]]
merge [] = []
merge [x] = [x]
merge ls = merge (mergePairs ls)
mergePairs :: Ord a => [[a]] -> [[a]]
mergePairs (l1:l2:ls) = mpair l1 l2 : mergePairs ls
mergePairs ls = ls
This will speed up the algorithm enormously, and it will now run on tens of millions of integers.
See mergeAll
.
They insert a bang to adjust the evaluation order, and push the second recursion down into the next definition.
You could experiment with the mutant offspring of the two implementations to try and understand reasons for their difference.