As a first step in learning Haskell I am solving this problem, which involves finding the Rith-ranked numbers, given some input list and many Ri. In an imperative language I would make a zeroed array c of length 200, increment c[h] for each height h, compute cumulative sums of c and then binary search c to determine the height corresponding to each given index. Because max_height is fixed, this has runtime linear in the size of input and bounded memory(*) excluding the input.
Here's my Haskell code:
max_height = 200
count_eq e = foldl (\c f -> if e == f then c + 1 else c) 0
counts heights = map (flip count_eq heights) [0..max_height]
first_gt e l = f l 0 where f (x:xs) i = if x > e then i else f xs (i+1)
solve heights indices = let accum = scanl1 (+) (counts heights) in
map (flip first_gt accum) (map (subtract 1) indices)
It is correct but slow. I would like to know how to (A) reason about and (B) improve the performance. Also (C) can I achieve the same asymptotic performance as the imperative code?
(*) assuming each c[i] fits in a machine int. I believe the runtime statement holds regardless.
-
\$\begingroup\$ Welcome to Code Review! Good job on your first question. \$\endgroup\$SirPython– SirPython2016年04月07日 23:52:32 +00:00Commented Apr 7, 2016 at 23:52
2 Answers 2
Maps as arrays
In counts heights
you are taversing heights 200 times.
It is possible to emulate array with Data.IntMap
and do this in one pass:
import Data.List (foldl')
import qualified Data.IntMap.Strict as Map
count = foldl' (\m h -> Map.insertWith (+) h 1 m)
(Map.fromList [(i,0) | i <- [1..200]])
Note that I'm using strict version of left fold (see here about foldl
vs foldl'
) and strict Map
. This means that map of height counts constructed in single pass without thunks.
solve heights = map (\ix -> 1 + length (takeWhile (<ix) counts))
where
counts = scanl1 (+) $ Map.elems $ count heights
I'm using 1 + length (takeWhile (<ix) counts)
instead of bare recursion in first_gt
.
A bit more code is required for binary search:
solve heights = map (maybe 0 snd . (`Map.lookupLT` countsMap))
where
counts = scanl1 (+) $ Map.elems $ count heights
countsMap = Map.fromList $ zip counts [2..]
Vectors as arrays
If log(n)
overhead of trees is too much for you, it is possible to use Data.Vector
which is for real arrays in Haskell.
import qualified Data.Vector.Unboxed as V
import qualified Data.Vector.Unboxed.Mutable as M
import qualified Data.Vector.Algorithms.Search as S
max_height = 200
solve :: [Int] -> [Int] -> IO [Int]
solve heights indices = do
v <- M.replicate (max_height+1) 0
mapM_ (M.modify v (+1)) heights
counts <- V.unsafeFreeze v >>= V.unsafeThaw . V.scanl1' (+)
mapM (S.binarySearchL counts) indices
binarySearchL
is from vector-algorithms package.
Some utility code to parse input:
main = getContents >>= mapM_ print . map (uncurry solve) . parse
parse :: String -> [([Int], [Int])]
parse
= pairs
. map
( takeWhile (/= 0) -- drop trailing zeroes
. map read . words -- convert string to ints
)
. tail . lines -- skip first line
pairs :: [a] -> [(a, a)]
pairs (x:y:xs) = (x, y) : pairs xs
pairs _ = []
Lists are not for random access. As far as I know, Vectors are the modern way to have fixed-length random-access listlikes. They will allow you to do the traversing updates you want in one pass.
import qualified Data.Vector as V
counts = V.toList . V.accum (+) (V.replicate 200 0) . map (,1)
first_gt doesn't need a manual implementation:
import Safe
first_gt e = findIndexJust (> e)
And just for fun, this should only traverse the counts result once. (Requires that indices is sorted, though.)
import Control.Monad.Trans.State
solve heights = scanl1 (+) . map length . splitsOnFoo . map (subtract 1) where
splitsOnFoo = evalState $ traverse (state . span . (<)) $ scanl1 (+) $ counts heights