Bottom up merge sort in Haskell

Question 1

Although this sort works on Int lists < length million, when the stack overflows, the last clause of the merge function definition looks odd to me, recursing in 2 places. Is there clearer way to define that definition and how can the stack limitation be avoided?

msort :: Ord a => [a] -> [a]
msort [] = []
msort xs = concat . merge . split $ xs
merge :: Ord a => [[a]] -> [[a]]
merge [] = []
merge [x] = [x]
merge (l1:l2:ls) = merge $ mpair l1 l2 : merge ls
mpair :: Ord a => [a] -> [a] -> [a]
mpair [] l2 = l2
mpair l1 [] = l1
mpair l1@(x:xs) l2@(y:ys) | x >= y = x : mpair xs l2
 | otherwise = y : mpair l1 ys
split :: [a] -> [[a]]
split xs = [[x]| x<-xs]

Question 2

You've spotted the problem. Your merge function is flawed. For each pair it processes, it introduces an extra unnecessary call to merge. That is, for a list like [1,2,3,4,5,6], instead of your first merge call expanding directly to:

merge [[1],[2],[3],[4],[5],[6]]
= merge $ mpair [1] [2] : mpair [3] [4] : mpair [5] [6] : []

it expands to:

merge [[1],[2],[3],[4],[5],[6]]
= merge $ mpair [1] [2] : merge $ mpair [3] [4] 
 : merge $ mpair [5] [6] : merge []

As a result, your count of merge calls is O(n) and your count of mpair calls is O(n^2) (or similar -- I didn't check exactly). when they should be O(log n) and O(n log n) respectively.

Instead, you want to split merge up into two functions:

merge :: Ord a => [[a]] -> [[a]]
merge [] = []
merge [x] = [x]
merge ls = merge (mergePairs ls)
mergePairs :: Ord a => [[a]] -> [[a]]
mergePairs (l1:l2:ls) = mpair l1 l2 : mergePairs ls
mergePairs ls = ls

This will speed up the algorithm enormously, and it will now run on tens of millions of integers.

Question 3

See mergeAll.

They insert a bang to adjust the evaluation order, and push the second recursion down into the next definition.

You could experiment with the mutant offspring of the two implementations to try and understand reasons for their difference.

K. A. Buhr K. A. Buhr 1,0065 silver badges8 bronze badges · Answer 1 · 2018-07-17 22:56:57Z

You've spotted the problem. Your merge function is flawed. For each pair it processes, it introduces an extra unnecessary call to merge. That is, for a list like [1,2,3,4,5,6], instead of your first merge call expanding directly to:

merge [[1],[2],[3],[4],[5],[6]]
= merge $ mpair [1] [2] : mpair [3] [4] : mpair [5] [6] : []

it expands to:

merge [[1],[2],[3],[4],[5],[6]]
= merge $ mpair [1] [2] : merge $ mpair [3] [4] 
 : merge $ mpair [5] [6] : merge []

As a result, your count of merge calls is O(n) and your count of mpair calls is O(n^2) (or similar -- I didn't check exactly). when they should be O(log n) and O(n log n) respectively.

Instead, you want to split merge up into two functions:

merge :: Ord a => [[a]] -> [[a]]
merge [] = []
merge [x] = [x]
merge ls = merge (mergePairs ls)
mergePairs :: Ord a => [[a]] -> [[a]]
mergePairs (l1:l2:ls) = mpair l1 l2 : mergePairs ls
mergePairs ls = ls

This will speed up the algorithm enormously, and it will now run on tens of millions of integers.

Gurkenglas Gurkenglas 3,79413 silver badges17 bronze badges · Answer 2 · 2018-07-12 11:14:07Z

See mergeAll.

They insert a bang to adjust the evaluation order, and push the second recursion down into the next definition.

You could experiment with the mutant offspring of the two implementations to try and understand reasons for their difference.

Stack Exchange Network

Bottom up merge sort in Haskell

2 Answers 2

Your Answer

Sign up or log in

Post as a guest

Post as a guest

Hot Network Questions

Bottom up merge sort in Haskell

2 Answers 2

Your Answer

Sign up or log in

Post as a guest

Post as a guest

Related

Hot Network Questions