Bucket sort in Haskell

Question 1

As a first step in learning Haskell I am solving this problem, which involves finding the R_i^th-ranked numbers, given some input list and many R_i. In an imperative language I would make a zeroed array c of length 200, increment c[h] for each height h, compute cumulative sums of c and then binary search c to determine the height corresponding to each given index. Because max_height is fixed, this has runtime linear in the size of input and bounded memory(*) excluding the input.

Here's my Haskell code:

max_height = 200
count_eq e = foldl (\c f -> if e == f then c + 1 else c) 0
counts heights = map (flip count_eq heights) [0..max_height]
first_gt e l = f l 0 where f (x:xs) i = if x > e then i else f xs (i+1)
solve heights indices = let accum = scanl1 (+) (counts heights) in
 map (flip first_gt accum) (map (subtract 1) indices)

It is correct but slow. I would like to know how to (A) reason about and (B) improve the performance. Also (C) can I achieve the same asymptotic performance as the imperative code?

(*) assuming each c[i] fits in a machine int. I believe the runtime statement holds regardless.

Question 2

Welcome to Code Review! Good job on your first question.

Question 3

Maps as arrays

In counts heights you are taversing heights 200 times. It is possible to emulate array with Data.IntMap and do this in one pass:

import Data.List (foldl')
import qualified Data.IntMap.Strict as Map
count = foldl' (\m h -> Map.insertWith (+) h 1 m)
 (Map.fromList [(i,0) | i <- [1..200]])

Note that I'm using strict version of left fold (see here about foldl vs foldl') and strict Map. This means that map of height counts constructed in single pass without thunks.

solve heights = map (\ix -> 1 + length (takeWhile (<ix) counts))
 where
 counts = scanl1 (+) $ Map.elems $ count heights

I'm using 1 + length (takeWhile (<ix) counts) instead of bare recursion in first_gt.

A bit more code is required for binary search:

solve heights = map (maybe 0 snd . (`Map.lookupLT` countsMap))
 where
 counts = scanl1 (+) $ Map.elems $ count heights
 countsMap = Map.fromList $ zip counts [2..]

Vectors as arrays

If log(n) overhead of trees is too much for you, it is possible to use Data.Vector which is for real arrays in Haskell.

import qualified Data.Vector.Unboxed as V
import qualified Data.Vector.Unboxed.Mutable as M
import qualified Data.Vector.Algorithms.Search as S
max_height = 200
solve :: [Int] -> [Int] -> IO [Int]
solve heights indices = do
 v <- M.replicate (max_height+1) 0
 mapM_ (M.modify v (+1)) heights
 counts <- V.unsafeFreeze v >>= V.unsafeThaw . V.scanl1' (+)
 mapM (S.binarySearchL counts) indices

binarySearchL is from vector-algorithms package.

Some utility code to parse input:

main = getContents >>= mapM_ print . map (uncurry solve) . parse
parse :: String -> [([Int], [Int])]
parse
 = pairs
 . map
 ( takeWhile (/= 0) -- drop trailing zeroes
 . map read . words -- convert string to ints
 )
 . tail . lines -- skip first line
pairs :: [a] -> [(a, a)]
pairs (x:y:xs) = (x, y) : pairs xs
pairs _ = []

Question 4

Lists are not for random access. As far as I know, Vectors are the modern way to have fixed-length random-access listlikes. They will allow you to do the traversing updates you want in one pass.

import qualified Data.Vector as V
counts = V.toList . V.accum (+) (V.replicate 200 0) . map (,1)

first_gt doesn't need a manual implementation:

import Safe
first_gt e = findIndexJust (> e)

And just for fun, this should only traverse the counts result once. (Requires that indices is sorted, though.)

import Control.Monad.Trans.State
solve heights = scanl1 (+) . map length . splitsOnFoo . map (subtract 1) where
 splitsOnFoo = evalState $ traverse (state . span . (<)) $ scanl1 (+) $ counts heights

max taldykin max taldykin 1,48110 silver badges13 bronze badges · Accepted Answer · 2016-04-09 08:37:40Z

Maps as arrays

In counts heights you are taversing heights 200 times. It is possible to emulate array with Data.IntMap and do this in one pass:

import Data.List (foldl')
import qualified Data.IntMap.Strict as Map
count = foldl' (\m h -> Map.insertWith (+) h 1 m)
 (Map.fromList [(i,0) | i <- [1..200]])

Note that I'm using strict version of left fold (see here about foldl vs foldl') and strict Map. This means that map of height counts constructed in single pass without thunks.

solve heights = map (\ix -> 1 + length (takeWhile (<ix) counts))
 where
 counts = scanl1 (+) $ Map.elems $ count heights

I'm using 1 + length (takeWhile (<ix) counts) instead of bare recursion in first_gt.

A bit more code is required for binary search:

solve heights = map (maybe 0 snd . (`Map.lookupLT` countsMap))
 where
 counts = scanl1 (+) $ Map.elems $ count heights
 countsMap = Map.fromList $ zip counts [2..]

Vectors as arrays

If log(n) overhead of trees is too much for you, it is possible to use Data.Vector which is for real arrays in Haskell.

import qualified Data.Vector.Unboxed as V
import qualified Data.Vector.Unboxed.Mutable as M
import qualified Data.Vector.Algorithms.Search as S
max_height = 200
solve :: [Int] -> [Int] -> IO [Int]
solve heights indices = do
 v <- M.replicate (max_height+1) 0
 mapM_ (M.modify v (+1)) heights
 counts <- V.unsafeFreeze v >>= V.unsafeThaw . V.scanl1' (+)
 mapM (S.binarySearchL counts) indices

binarySearchL is from vector-algorithms package.

Some utility code to parse input:

main = getContents >>= mapM_ print . map (uncurry solve) . parse
parse :: String -> [([Int], [Int])]
parse
 = pairs
 . map
 ( takeWhile (/= 0) -- drop trailing zeroes
 . map read . words -- convert string to ints
 )
 . tail . lines -- skip first line
pairs :: [a] -> [(a, a)]
pairs (x:y:xs) = (x, y) : pairs xs
pairs _ = []

Stack Exchange Network

Bucket sort in Haskell

2 Answers 2

Maps as arrays

Vectors as arrays

Your Answer

Sign up or log in

Post as a guest

Post as a guest

Hot Network Questions

Bucket sort in Haskell

2 Answers 2

Maps as arrays

Vectors as arrays

Your Answer

Sign up or log in

Post as a guest

Post as a guest

Related

Hot Network Questions