I'm not sure if a Markov Chain generator is the proper term for this, really all it does is create the chain, it doesn't generate any text from it. The code is below, and I'd appreciate any feedback: bad practices, code smells, poor naming, anything at all really.
module Main where
import Data.List(find, delete)
type Chain = [Node]
data Node = Node {nPrefix :: String, nSuffixes :: [String]} deriving (Show, Eq, Read)
generateChain :: Int -> [String] -> Chain
generateChain n = foldl (learnSample n) []
learnSample :: Int -> Chain -> String -> Chain
learnSample n chain sample = foldl learnGroup chain groups
where groups = (groupings (n+1) . words) sample
learnGroup :: Chain -> [String] -> Chain
learnGroup chain group =
case find (\node -> nPrefix node == prefix) chain of
Just node -> delete node chain ++ [Node prefix (nSuffixes node ++ [suffix])]
Nothing -> chain ++ [Node prefix [suffix]]
where
prefix = unwords $ init group
suffix = last group
groupings :: Int -> [a] -> [[a]]
groupings n list
| length list >= n = take n list : groupings n (drop 1 list)
| otherwise = []
main :: IO ()
main = print $ generateChain 1 ["a b a b a c a d"]
2 Answers 2
Looks good, the only thing I'd suggest is that since you don't really
rely on having strings as nodes, you can also generalise your
implementation to arbitrary chains, i.e. Chain Char
for this particular
main
function. You'll need to change to a GADT (I think) but
otherwise the code would just need a few minor additional changes.
This is very nice haskell code. I was slightly irritated by the use of foldl
, which should usuablly be replaced by foldl'
. Using foldl
seems perfectly valid here, but it might warrant a comment as to why it was used.
Something I'm unsure about is your choice of data structures to represent the chain. My understanding of Markov chains is that what's called suffixes here is a set of Nodes reachable from this node. A Data.Set or Data.Map.Strict.Map might be a better choice to capture this. The latter choice would also give us access to functions such as insertWith
, which could be used to simplify learnGroup
. But still, nice code.