In the MVE code below I have tried to create a function collect which is supposed to take a RegModule-monad as argument such as the scanChar and when this scanChar or other RegModule succeeds in scanning a char as seen in its case branch then the collect shall behave the same way, that is 'scan' the char as well, and increment the associated i as seen in scanChar. On top of the behavior of scanChar however it shall also return a string, hence the return type RegModule d (String, a), where string are all of the 'scanned' Chars within an input string. This shall however not only apply to the scanChar but more generally to other types of monadic-functions using RegModule, but as a start if it could be implemented to only take scanChar into account that will be fine.
The problem is that when I try to return a string I get a type inconsistency error since data d' that I try to use for the function is not explicitly of that time. I have tried with using show but this require that I change the type signature of the method collect, which I would like to avaoid. Any ideas about how to work around this without changing the type signature of any of the methods?
import qualified Data.Set as S
import Control.Monad
type CharSet = S.Set Char
data RE =
RClass Bool CharSet
newtype RegModule d a =
RegModule {runRegModule :: String -> Int -> d -> [(a, Int, d)]}
instance Monad (RegModule d) where
return a = RegModule (\_s _i d -> return (a, 0, d))
m >>= f =
RegModule (\s i d -> do (a, j, d') <- runRegModule m s i d
(b, j', d'') <- runRegModule (f a) s (i + j) d'
return (b, j + j', d''))
instance Functor (RegModule d) where fmap = liftM
instance Applicative (RegModule d) where pure = return; (<*>) = ap
scanChar :: RegModule d Char
scanChar = RegModule (\s i d ->
case drop i s of
(c:cs) -> return (c, i+1, d)
[] -> []
)
regfail :: RegModule d a
regfail = RegModule (\_s _i d -> []
)
regEX :: RE -> RegModule [String] ()
regEX (RClass b cs) = do
next <- scanChar
if (S.member next cs)
then return ()
else regfail
fetchData :: RegModule d d
fetchData = RegModule (\_s _i d -> [(d, 0, d)])
collect :: RegModule d a -> RegModule d (String, a)
collect module = do
a <- module
consumed <- fetchData
let consumedStr = (show consumed)
return (consumedStr, a)
runRegModuleThrice :: RegModule d a -> String -> Int -> d -> [(a, Int, d)]
runRegModuleThrice matcher input startPos state =
let (result1, pos1, newState1) = head $ runRegModule matcher input startPos state
(result2, pos2, newState2) = head $ runRegModule matcher input pos1 newState1
(result3, pos3, newState3) = head $ runRegModule matcher input pos2 newState2
in [(result1, pos1, newState1), (result2, pos2, newState2), (result3, pos3, newState3)]
1 Answer 1
Your monad seems a little buggy. In the signature:
String -> Int -> d -> [(a, Int, d)]
the implementation of >>= suggests that the String is the full, constant input string, the first Int is an offset into that String, and the second Int is the number of characters read by the operation and not the new offset. In particular, when the computation on the LHS of >>= starts at offset i and returns a count of characters scanned j, the computation on the right hand side is run starting at offset i+j, not j.
However, your scanChar implementation doesn't appear to match this implementation, since it starts scanning at offset i and then returns the new offset i+1, instead of the number of characters read, which should just be 1.
The reason I bring all this up is that you probably want collect m to run m and then use the offset and number of characters scanned by m to directly extract the scanned substring and add it to the return value, something like:
collect :: RegModule d a -> RegModule d (String, a)
collect m = RegModule $ \s i d -> do
(a, j, d') <- runRegModule m s i d
pure ((take j (drop i s), a), j, d')
In order for this to work with your scanChar, the definition will need to be fixed:
scanChar :: RegModule d Char
scanChar = RegModule (\s i d ->
case drop i s of
(c:cs) -> return (c, 1, d) -- return "1" char scanned
[] -> []
)
6 Comments
i+1 is actually correct as someone else pointed out to me here: stackoverflow.com/questions/76594962/… . I have just tested your code for just 1: ghci> runRegModuleTwice (collect scanChar) "aaa" 0 [] [(("a",'a'),1,[]),(("a",'a'),1,[]),(("a",'a'),1,[])]i+1 I get: ghci> runRegModuleTwice (collect scanChar) "aaa" 0 [] [(("a",'a'),1,[]),(("aa",'a'),2,[]),(("a",'a'),3,[])] which is slightly better, since there seems to be an accumulation in the captured data in the first and second iteration, although the captured elements it seemed reset back to just "a" at the last element of the list.j should be 1 considering these 'prints' of the runRegModuleTwice, but I suspect that your might actually still be right, and perhaps that the flaw is actually in runRegModuleTwice which I implemented myself but which my reflect an underlying and perhaps incorrect conception of the whole code or approach to making a RegEx parser using monads. I need to think more about why you pure approach works without type errors, but I suspsect that what is collected in your code is the input string and not only the 'captured' Chars.collect a bit into: collect m = RegModule ( \s i d -> do (a, i', d') <- runRegModule m s i d; return ((take (i'-i) (drop i s), a), i', d')) this seems to produce ghci> runRegModuleTwice (collect scanChar) "abcd" 0 [] [(("a",'a'),1,[]),(("b",'b'),2,[]),(("c",'c'),3,[]),(("d",'d'),4,[])], which is what is expected for the scanChar I think, since it shows every char that was consumed. I am however still not certain that you are not correct in you assessment above. In the post I linked the comments mention something about history and monads.i+1 rather than 1, since the latter should indeed be the increment-ratio and not the new offset considering the way the bind is defined.Explore related questions
See similar questions with these tags.
gather, look at howgatheris implemented and do likewise.ReadP ahides a function that returnsR a.gathertakes this function and produces another one that returnsR (String, a). Your type hides a function that returnssomething something [(a, Int, d)]. Can you build a function that returnssomething something [((String, a), Int, d)]out of it?RegModuletype is all about, because I surely can neither gather not collect what it is supposed to do. Why does it have the type it has?