I am implementing a mutable trie. I tested and it produces correct results. However, it is very slow. I benchmark it against plain old Data.Map
, which is more than twice as fast.
So what am I doing wrong? I am expecting a fundamental flaw in the way I write mutable code here and not some micro performance tricks. I assume STVector
is fast enough.
module Trie(Trie, empty, insert, member) where
import Data.Char (ord, toUpper)
import Data.Maybe (isJust)
import Control.Monad.ST
import qualified Data.Vector.Mutable as V
-- In a real world scenario, we would probably want to our mutable trie to base
-- on PrimMonad and PrimState so that it can work within both IO and ST. Also,
-- our trie only accepts String keys composed of capitalized English alphabets
-- ([A-Z]+). Values, though, can be of any type. Finally, there should have
-- been a function to retrieve the value given a key, but we omitted it because
-- of laziness (pun intended).
data Trie s a = Trie {
trieValue :: Maybe a ,
trieChildren :: V.STVector s (Maybe (Trie s a))
}
toIndex :: Char -> Int
toIndex c = (ord (toUpper c) - ord 'A') `mod` 26
empty :: ST s (Trie s a)
empty = emptyWith Nothing
newChildren :: ST s (V.STVector s (Maybe (Trie s a)))
newChildren = V.replicate 26 Nothing
emptyWith :: Maybe a -> ST s (Trie s a)
emptyWith x = newChildren >>= return . Trie x
insert :: Trie s a -> String -> a -> ST s (Trie s a)
insert = insert' . Just
insert' :: Maybe (Trie s a) -> String -> a -> ST s (Trie s a)
insert' Nothing cs z = do
node <- empty
insert node cs z
insert' (Just root@(Trie _ ys)) [c] z = do
node <- emptyWith (Just z)
V.write ys (toIndex c) (Just node)
return root
insert' (Just root@(Trie _ ys)) (c:cs) z = do
let i = toIndex c
y <- V.read ys i
insert' y cs z >>= V.write ys i . Just
return root
member :: Trie s a -> String -> ST s Bool
member = member' . Just
member' :: Maybe (Trie s a) -> String -> ST s Bool
member' Nothing _ = return False
member' (Just (Trie x _)) [] = return (isJust x)
member' (Just (Trie _ ys)) (c:cs) =
V.read ys (toIndex c) >>= flip member' cs
1 Answer 1
The big thing I can see is that you don't seem to know why to use a mutable structure in Haskell. They're not automatically faster. In fact, they have some GC-related overhead that exceeds that of immutable structures in GHC. Mutable structures in GHC are only a win if they can actually reduce the amount of allocation, and this implementation looks like it allocates a lot.
When you consider using a mutable structure, you need to examine how mutability will reduce total allocation. If it doesn't reduce allocation (and usually by an asymptotic factor) it's unlikely that mutability alone is going to help performance.
-
\$\begingroup\$ Well I wrote it to learn how to implement mutable data structures. That is the whole intention. And I was very surprised with the performance. I wonder if it can be significantly improved. \$\endgroup\$Seri– Seri2014年08月28日 04:25:16 +00:00Commented Aug 28, 2014 at 4:25
-
2\$\begingroup\$ In that case, you did succeed. This is correct, if limited, and definitely uses mutation. Just remember, when considering performance, that this was your first attempt at doing something like this - and
Data.Map
has been around for many years with several significant performance improvements. \$\endgroup\$Carl– Carl2014年08月28日 16:29:06 +00:00Commented Aug 28, 2014 at 16:29 -
\$\begingroup\$ Thanks for everything. This really got me interested in purely functional data structures. I used to think that they are beautiful but just not very fast. I added
Data.Trie
frombytestring-trie
and it convincingly beatData.Map
. Say, if I want to implement a core subset ofData.Map
now, where do I start, other than Okasaki? (I can write a red black tree in an imperative language) \$\endgroup\$Seri– Seri2014年08月29日 06:36:14 +00:00Commented Aug 29, 2014 at 6:36 -
1\$\begingroup\$ There probably isn't a faster approach than reading the code for structures you're interested in. Okasaki's the only overarching document on the design of persistent data structures. Everything else is single-topic papers. \$\endgroup\$Carl– Carl2014年08月29日 14:26:10 +00:00Commented Aug 29, 2014 at 14:26
toIndex c = ord c - 65
. Shaves off a second or so. \$\endgroup\$