I am quite new to Haskell, and this problem is from dailycodingproblem.com:
Implement an autocomplete system. That is, given a query string
s
and a set of all possible query strings, return all strings in the set that haves
as a prefix. For example, given the query stringde
and the set of strings[dog, deer, deal]
, return[deer, deal]
.Hint: Try preprocessing the dictionary into a more efficient data structure to speed up queries.
import Data.Map.Strict (Map)
import qualified Data.Map.Strict as Map
import Data.HashSet (HashSet)
import qualified Data.HashSet as HashSet
data Trie = Trie {children :: Map Char Trie, wordsWithPrefix :: HashSet String} deriving Show
emptyTrie :: Trie
emptyTrie = Trie Map.empty HashSet.empty
insertWord :: Trie -> String -> Trie
insertWord trie [] = trie
insertWord (Trie childs wwp) (x:xs) = Trie newChildren newWordsWithPrefix
where
childTrie = Map.findWithDefault emptyTrie x childs
newChildTrie = insertWord childTrie xs
newChildren = Map.insert x newChildTrie childs
newWordsWithPrefix = HashSet.insert (x:xs) wwp
searchPrefix :: Trie -> String -> HashSet String
searchPrefix trie [] = wordsWithPrefix trie
searchPrefix trie (x:xs) =
case Map.lookup x (children trie) of
Nothing -> HashSet.empty
Just child -> HashSet.map (x:) (searchPrefix child xs)
makeTrie :: [String] -> Trie
makeTrie wordSet = foldl insertWord emptyTrie wordSet
autocomplete :: [String] -> String -> [String]
autocomplete wordSet s = HashSet.toList $ searchPrefix (makeTrie wordSet) s
It works, but I'm not sure about the performance and less sure that it aligns with best practices.
1 Answer 1
My first instinct here was to not respond, on the basis that there's not much to add. I'm not about to set up benchmarking to measure the performance, and the code is fine idiomatic haskell. Specifically, there's just small set of functions that do obvious things and combine in obvious ways. What's not to like?
Except...
The nature of a Trie is that the data is encoded in the data-structure's structure. So why are you storing it in duplicate in the wordsWithPrefix
field? Our instinct is to store things in just one way; whether it's more efficient or not, it avoids conflicts between the different storage systems.
Of course it's not obvious else we should be doing it, and there may be performance advantages to your strategy. Do we want a separate Leaf String
constructor for when there's only one suffix going forward from a node? How do we distinguish the Tries of ["abc"]
and ["ab", "abc"]
? Of course we could (and really, we probably should) just use something from the tries
package, but if we're going to roll our own we should first try to have as few moving parts as possible. I came up with
data Trie = Occupied (Map Char Trie) | UnOccupied (Map Char Trie)
Take a moment to notice a semantic detail implicit in this declaration: emptyTrie
is obviously UnOccupied Map.empty
, which is different from Occupied Map.empty
; therefore insertWord trie []
is not trie
!
Of course having written that, I immediately abstracted out the Char
in favor of data Trie a = ...
, and after working with it for a minute I switched to the less-pretty but more-user-friendly
data Trie a = Trie { children :: Map Char (Trie a), occupied :: Bool }
insertWord
is doing most of your work, so let's focus on that for a moment. Using {-# LANGUAGE NamedFieldPuns #-}
and record syntax, I get
insertWord :: Trie a -> [a] -> Trie a
insertWord trie [] = trie{occupied=True}
insertWord trie@Trie{children} (x:xs) = trie{children=newChildren}
where childTrie = Map.findWithDefault emptyTrie x children
newChildTrie = insertWord childTrie xs
newChildren = Map.insert x newChildTrie children
What else do I find as I go?
searchPrefix
should probably return a List; the conversion will need to happen someplace and doing it here makes some stuff simpler.- Using
mempty
instead of the various things it can stand for saves the need to changes stuff as your implementation changes, and is often more succinct. - Many data structures have a
singleton
option to go withempty
; might as well giveTrie
one too. - Many data structures have a
toList
andfromList
; you can use all of these interchangeably withimport GHC.Exts (fromList, toList)
. Might as well implementIsList (Trie a)
instead ofmakeTrie
. - The presence of
emptyTrie
is suggestive of theMonoid
class. ImplementSemigroup
andMonoid
forTrie
. - The payoff of implementing all this stuff for
Trie
is, in addition to making your data type more generally useful, that you can now implementinsertWord
andsearchPrefix
succinctly in terms of class operations. - Rename
emptyTrie
andinsertWord
toempty
andinsert
for consistency withMap
and other comparable data structures.
{-# LANGUAGE NamedFieldPuns #-}
{-# LANGUAGE TypeFamilies #-}
module Main where
import Data.List (sort)
import Data.Map.Strict (Map)
import qualified Data.Map.Strict as Map
import GHC.Exts (IsList(Item), fromList, toList)
data Trie a = Trie { children :: Map a (Trie a), occupied :: Bool } deriving Show
empty :: Trie a -- We could skip this in favor of mempty, but `Map` has it's own empty which doesn't require `Ord`, so we'll follow their lead.
empty = Trie Map.empty False
singleton :: [a] -> Trie a
singleton [] = empty{occupied=True}
singleton (a:as) = empty{children = Map.singleton a $ singleton as}
instance (Ord a) => Semigroup (Trie a) where
t1 <> t2 = Trie {children = Map.unionWith (<>) (children t1) (children t2), occupied = occupied t1 || occupied t2}
instance (Ord a) => Monoid (Trie a) where mempty = empty
instance (Ord a) => IsList (Trie a) where
type Item (Trie a) = [a]
fromList = foldMap singleton
toList Trie{children, occupied} = [ [] | occupied] -- this syntax isn't well documented, but works fine and is kinda common.
++ (concatMap accumulate . toList . fmap toList $ children)
where accumulate (a, as) = (a:) <$> as -- Pretty sure we could replace this with some trainwreak based on `curry (:)`, but let's not.
insert :: (Ord a) => Trie a -> [a] -> Trie a
insert trie as = trie <> singleton as
search :: (Ord a) => Trie a -> [a] -> [[a]]
search trie [] = toList trie
search Trie{children} (x:xs) =
case Map.lookup x children of
Nothing -> mempty
Just child -> (x:) <$> search child xs
autocomplete :: [String] -> String -> [String]
autocomplete wordSet = search (fromList wordSet)
main :: IO ()
main = do
let ws = ["dog", "deer", "deal"]
let res = autocomplete ws "de"
print res
print $ sort res == sort ["deer", "deal"]