Running Monad multiple times in haskell and updating state

Question 1

Hi for the following code I have tried to run the RegModule monad such that one state will effect the next, as seen in the output at the end of the code (MVE) below the state of the monad doesn't seem to be updated as the increment only works for the first call to the monad rather than increment on each of the runRegModule calls. How can I ensure that its state get updated. And is there a way to do it that would be less manual such that the Monad would be called until it fails using forM or the like.

import qualified Data.Set as S
import Control.Monad
type CharSet = S.Set Char
data RE =
 RClass Bool CharSet
newtype RegModule d a =
 RegModule {runRegModule :: String -> Int -> d -> [(a, Int, d)]}
instance Monad (RegModule d) where
 return a = RegModule (\_s _i d -> return (a, 0, d))
 m >>= f =
 RegModule (\s i d -> do (a, j, d') <- runRegModule m s i d
 (b, j', d'') <- runRegModule (f a) s (i + j) d'
 return (b, j + j', d''))
instance Functor (RegModule d) where fmap = liftM
instance Applicative (RegModule d) where pure = return; (<*>) = ap
scanChar :: RegModule d Char
scanChar = RegModule (\s i d ->
 case drop i s of
 (c:cs) -> return (c, 1, d)
 [] -> []
 )
regfail :: RegModule d a
regfail = RegModule (\_s _i d -> []
 )
regEX :: RE -> RegModule [String] ()
regEX (RClass b cs) = do
 next <- scanChar 
 if (S.member next cs)
 then return ()
 else regfail
 
runRegModuleThrice :: RegModule d a -> String -> Int -> d -> [(a, Int, d)]
runRegModuleThrice matcher input startPos state =
 let (result1, pos1, newState1) = head $ runRegModule matcher input startPos state
 (result2, pos2, newState2) = head $ runRegModule matcher input pos1 newState1
 (result3, pos3, newState3) = head $ runRegModule matcher input pos2 newState2
 in [(result1, pos1, newState1), (result2, pos2, newState2), (result3, pos3, newState3)]

OUTPUT:

ghci> runRegModuleThrice (regEX (RClass False (S.singleton 'a'))) "aaa" 0 []
[((),1,[]),((),1,[]),((),1,[])]

Expected result:

ghci> runRegModuleThrice (regEX (RClass False (S.singleton 'a'))) "aaa" 0 []
[((),1,[]),((),2,[]),((),3,[])]

EDIT in order to answer comments

There are several things about the suggestiosn in the comments that confuses me. Therefore I find it necessary to edit the post rather than just using the comment section to respond.

Furthermore I shall here try to directionalize the problem-solving in order to respond to especially the comments made with regards to the similarity of the current problem to the post How to use `runAccumT` , this also to open up for the possibility of the correct answer/solution being some answer/solution that might not be how I initially thought it to be, and hence why I shall also be open to that this answer/solution might not produce the output that I orginially expected.

As I understand the comments (and this might be a falacious understanding since I am still quite new at Haskell), they highlight and claim that my line newtype RegModule d a = RegModule {runRegModule :: String -> Int -> d -> [(a, Int, d)]} is the cause of the error, since it doesn't take into account an accumulated value, but just a single Int, when running the Monad RegModule. Therefore if the code above is to succeed I am to change it into something like newtype RegModule d a = RegModule {runRegModule :: String -> Int -> d -> [(a, Sum Int, d)]} instead since the input value isn't the Sum Int but rather just an Int which cannot accumulate values because it isn't a Monoidically value.

Some concerns about the comments at stake:

The problem I posted above including the Monad and the runner monad are code that I have from an assignment text. It should be correct. However it is supposed to be code that should make a basis for several code parts. That is, a regex parser and a Matcher (Interpreter). This code is not supposed to be changed. I am however not that skilled at Haskell so I thought it would be relevant to have some 'printable' or visible code that would show some output while I was building the remaining code. That is why I suggested the runRegModuleThrice to manually get a feeling for how values accumulated (e. g. while the parser ran over an input string).

It can be the case that when trying to make it possible to print the output, that I am somehow forcing the coding in a way that is deviating from what was inteded with the original assignment code for how a parser and matcher is supposed to be combined.

I am however not sure what to expect for a parser and whether or not the regEX module provided along with the other parts are supposed to give some list akin to the above mentioned expected outputs or perhaps something more similar to [((),1,['a']),((),2,['a']),((),3,['a'])], or not. Although I suspect they are.

Some follow up questions:

If it is the case that I interpreted it correct with regards to the above about the lack of a Monoidically value, isn't much of the idea behind Monads lost including the ones that were depicted in the LYAH: http://learnyouahaskell.com/for-a-few-monads-more about the Writer Monad.
Despite creating changes to the runner monad are there ways to show an accumulated value or 'appended text' as the output?
Isn't it likely that there might be some way to archieve the outcome [((),1,['a']),((),2,['a']),((),3,['a'])] (or an even more 'correct' variant), by solely changing the regEx, scanChar, ``regfailand/orrunRegModuleThrice? Th is could perhaps be archieved by usingdo-notation for the runRegModuleThrice`, as I understand it the do notation is used for unpacking the value of a Monad and possibly using it in the next. However this doesn't seem to work.
In general from examples I have encountered in LYAH it hasn't been stressed that the runner should be changed. I am a bit confused to why that is, and if it is something in a later release of Haskell.

Question 2

Are you sure this is a monad? I don't think your instance obeys the monad laws. Also you don't use anything monadic in runRegModuleThrice (though it certainly looks like this could be done better with a state monad). Show the result you would have liked to get.

Question 3

I am not sure it is a monad considering your points about the monad laws. I am still not skilled enough to know those by heart, so I can try to look them up an give a qualified guess. The reason why I made the output this manually as a first step is to just come up with some code I could understand. I will edit the post to show you what I expected of the runRegModuleThrice but didn't get.

Question 4

@leftaroundabout I think the monad instance is basically the one you get for ReaderT String (AccumT (Sum Int) (StateT d [])), so it should obey the laws.

Question 5

@Piskator You might like this question that doesn't seem related at first glance: How to use runAccumT. The issue in that question is essentially the same as your issue in the end, I believe: you need a runRegModule wrapper that adds the incoming Int to the output Int.

Question 6

@DanielWagner but I don't understand ... is all of that extra parts really necessary in my cases. Isn't mine closer to the writer type monad depicted here: learnyouahaskell.com/for-a-few-monads-more : instance (Monoid w) => Monad (Writer w) where ... it seems they also change the state of one part of the resulting tupple.

Question 7

I'll respectfully disagree with the other answer here. The underlying problem is that you've misinterpreted the output Int in the tuple (a, Int, d). For this monad, the input Int is the offset into the string, but the output Int is the number of characters scanned, not the new offset. This means that -- using the original code posted in the question -- your actual output is correct:

ghci> runRegModuleThrice (regEX (RClass False (S.singleton 'a'))) "aaa" 0 []
[((),1,[]),((),1,[]),((),1,[])]

because those output Ints are the number of characters scanned, not the new offsets. Since the matcher always matches a single character, if it matches anything at all, each of the three runs of the matcher should return 1, as in 1 character scanned.

However, your code is still buggy, because if you try a test case with the input string "aa", it still matches:

ghci> runRegModuleThrice (regEX (RClass False (S.singleton 'a'))) "aaa" 0 []
[((),1,[]),((),1,[]),((),1,[])]

The problem is that, in runRegModuleThrice, you are reimplementing the bind (>>=) operation for the monad but your implementation is incompatible with the actually "meaning" of the input and output Ints in the monad.

As I noted above, the definition for >>= in the monad instance is set up to treat the input Int as an offset into the string but the output Int as the number of characters scanned, and when the bind chains the left-hand-side to the right-hand-side, it adds this output Int (number of characters scanned) to the original input Int (starting offset) to get the new input Int (next offset). It also returns the sum of the two output Ints -- the number of characters scanned by the LHS plus the number of characters scanned by the RHS. If the output Ints were offsets, it would just return the final offset j' directly, not try to add it to another offset:

m >>= f =
 RegModule (\s i d -> do (a, j, d') <- runRegModule m s i d
 -- ^ ^ ^
 -- | | |
 -- | output count of characters scanned |
 -- | |
 -- `-- starting offset as input ------------'
 (b, j', d'') <- runRegModule (f a) s (i + j) d'
 -- new offset is *sum* of starting offset and number scanned -^^^^^
 return (b, j + j', d''))
 -- ^^^^^^
 -- returned value is total number of characters scanned, j from
 -- the first, and j' from the second, **NOT** the new offset

Assuming the monad instance is correct, your original scanChar implementation was also correct. When it succeeds, it scans a single character, so it should always return 1 on success.

The bug is in your implementation of runRegModuleThrice which is misinterpreting the output Int as an offset. If you modify it to treat the input Ints as offsets and the output Ints as counts of characters scanned:

runRegModuleThrice :: RegModule d a -> String -> Int -> d -> [(a, Int, d)]
runRegModuleThrice matcher input startPos state =
 let (result1, cnt1, newState1) = head $ runRegModule matcher input startPos state
 (result2, cnt2, newState2) = head $ runRegModule matcher input (startPos + cnt1) newState1
 (result3, cnt3, newState3) = head $ runRegModule matcher input (startPos + cnt1 + cnt2) newState2
 in [(result1, cnt1, newState1), (result2, cnt2, newState2), (result3, cnt3, newState3)]

then it should work correctly. Note that it STILL will return the output you claimed was wrong:

ghci> runRegModuleThrice (regEX (RClass False (S.singleton 'a'))) "aaa" 0 []
[((),1,[]),((),1,[]),((),1,[])]

but this output is actually correct, as explained above.

It'll crash on an input of "aa":

λ> runRegModuleThrice (regEX (RClass False (S.singleton 'a'))) "aa" 0 []
[((),1,[]),((),1,[]),(*** Exception: Prelude.head: empty list
CallStack (from HasCallStack):
 error, called at libraries/base/GHC/List.hs:1646:3 in base:GHC.List
 errorEmptyList, called at libraries/base/GHC/List.hs:85:11 in base:GHC.List
 badHead, called at libraries/base/GHC/List.hs:81:28 in base:GHC.List
 head, called at Collect.hs:54:36 in main:Collect

but this is correct, too. Your runRegModuleThrice is written to assume that each run of the matcher succeeds and returns at least one match. When a run fails (like the third match failing here), it tries to take the head of an empty list and crashes, as designed.

I think what you were SUPPOSED to do here was implement runRegModuleThrice using the monad's implementation of >>=, not reimplement it from scratch, so maybe something more like:

runRegModuleThrice' :: RegModule d a -> String -> Int -> d -> [(a, Int, d)]
runRegModuleThrice' matcher = runRegModule (matcher >> matcher >> matcher)

which uses the monad bind (i.e., the definition of >>=) via the >> operator.

This does not return a "history" of the three matches. Instead, it returns the result of matching on the composite monadic match. This should return the a value of the final match, which is just () for your test case. It should also return the total number of character matched by the three matchers that are part of the composite match, which is the number 3, i.e., three matches of 1 character each.

And, in fact, it does just that:

λ> runRegModuleThrice' (regEX (RClass False (S.singleton 'a'))) "aaa" 0 []
[((),3,[])]

It also works correctly on strings that don't match without crashing:

λ> runRegModuleThrice' (regEX (RClass False (S.singleton 'a'))) "aa" 0 []
[]
λ> runRegModuleThrice' (regEX (RClass False (S.singleton 'a'))) "a" 0 []
[]
λ> runRegModuleThrice' (regEX (RClass False (S.singleton 'a'))) "bbb" 0 []
[]

Question 8

Thank you for your very thourough response. It is much appreciated. I think you are correct in your assessments and interpretations.

Question 9

One question. How come your runRegModuleThrice' takes 4 arguments like mine in the type signature, but in the actual function is only supplied with 1 argument? Is this currying? And if so, I thought currying would imply that you wouldn't need all the 4 arguments in the type signature, but just use whatever arguments the runRegModule would take within the function, based on currying.

Question 10

Yes, it boils down to currying. One way of looking at it is that the type a -> b -> c -> d -> e is the same as the type a -> (b -> c -> d -> e). In words, a four argument function is the same as a one argument function that returns a three argument function. Without any change to the type signature, my definition for this function can either take four arguments and return a final e, or it can take one argument and return a three argument function (that returns e). In my definition, I used runRegModule in the body to create the three argument function I wanted.

Question 11

I have actually posted a new thread with some pacularities about your runRegModuleThrice' that eludes me with regards to the discard function: stackoverflow.com/questions/76694011/…

Question 12

The issue seems to be that the definition of scanChar is explicitly ignoring the input argument Int:

scanChar = RegModule (\s i d ->
 case drop i s of
 (c:cs) -> return (c, 1, d)
 // ^ here we return 1, not i or any other expression which uses i
 [] -> []
 )

You are passing each newState as input to the next usage of runRegModule, however, this makes no difference, since the first thing you do in each call to regEX is ignore that newState.

Let's break down the execution of a function created by regEX, in particular focusing on the "index" value (I'm really not sure what this is supposed to represent, so maybe "index" is a bad name).

I'll rewrite it a bit for clarity. Start with this:

regEXMatchRClass b cs next = 
 if (S.member next cs)
 then return ()
 else regfail
regEX (RClass b cs) = scanChar >>= regEXMatchRClass b cs

and note that we can write these components in an equivalent way (with irrelevant parts omitted):

scanChar = RegModule (\s i d -> map (\x -> (x, 1, d)) ...)
regEXMatchRClass b cs next = RegModule (\s i d -> map (\x -> (x, i, d)) ...)

This form may help to convince you that this function regEX simply ignores the original input i which is fed to it.

Try instead something like this:

scanChar = RegModule (\s i d ->
 case drop i s of
 (c:cs) -> return (c, 1 + i, d)
 [] -> []
 )

Question 13

I am not sure what you mean with these two lines: scanChar = RegModule (\s i d -> map (\x -> (x, 1, d)) ...) and regEXMatchRClass b cs next = RegModule (\s i d -> map (\x -> (x, i, d)) ...). Are you in a sense trying to work around a monad by including a map-function instead of the Monad instantiation in the beginning?

Question 14

As I understand your overall point you have rewritten the monadic functions by removing the syntatic sugar (do-notation). And by doing so you want to highlight how the i is ignored in the chaining between scanChar and regEXMatchClass since the former doesn't include i in the applied function/operation. So the latter only takes the 'fixed' output of the former as an input.

Question 15

When I use the i+1 in scanChar instead it does produce the expected result however.

K. A. Buhr 52.7k3 gold badges55 silver badges82 bronze badges · Accepted Answer · 2023-07-10 17:09:03Z

I'll respectfully disagree with the other answer here. The underlying problem is that you've misinterpreted the output Int in the tuple (a, Int, d). For this monad, the input Int is the offset into the string, but the output Int is the number of characters scanned, not the new offset. This means that -- using the original code posted in the question -- your actual output is correct:

ghci> runRegModuleThrice (regEX (RClass False (S.singleton 'a'))) "aaa" 0 []
[((),1,[]),((),1,[]),((),1,[])]

because those output Ints are the number of characters scanned, not the new offsets. Since the matcher always matches a single character, if it matches anything at all, each of the three runs of the matcher should return 1, as in 1 character scanned.

However, your code is still buggy, because if you try a test case with the input string "aa", it still matches:

ghci> runRegModuleThrice (regEX (RClass False (S.singleton 'a'))) "aaa" 0 []
[((),1,[]),((),1,[]),((),1,[])]

The problem is that, in runRegModuleThrice, you are reimplementing the bind (>>=) operation for the monad but your implementation is incompatible with the actually "meaning" of the input and output Ints in the monad.

As I noted above, the definition for >>= in the monad instance is set up to treat the input Int as an offset into the string but the output Int as the number of characters scanned, and when the bind chains the left-hand-side to the right-hand-side, it adds this output Int (number of characters scanned) to the original input Int (starting offset) to get the new input Int (next offset). It also returns the sum of the two output Ints -- the number of characters scanned by the LHS plus the number of characters scanned by the RHS. If the output Ints were offsets, it would just return the final offset j' directly, not try to add it to another offset:

m >>= f =
 RegModule (\s i d -> do (a, j, d') <- runRegModule m s i d
 -- ^ ^ ^
 -- | | |
 -- | output count of characters scanned |
 -- | |
 -- `-- starting offset as input ------------'
 (b, j', d'') <- runRegModule (f a) s (i + j) d'
 -- new offset is *sum* of starting offset and number scanned -^^^^^
 return (b, j + j', d''))
 -- ^^^^^^
 -- returned value is total number of characters scanned, j from
 -- the first, and j' from the second, **NOT** the new offset

Assuming the monad instance is correct, your original scanChar implementation was also correct. When it succeeds, it scans a single character, so it should always return 1 on success.

The bug is in your implementation of runRegModuleThrice which is misinterpreting the output Int as an offset. If you modify it to treat the input Ints as offsets and the output Ints as counts of characters scanned:

runRegModuleThrice :: RegModule d a -> String -> Int -> d -> [(a, Int, d)]
runRegModuleThrice matcher input startPos state =
 let (result1, cnt1, newState1) = head $ runRegModule matcher input startPos state
 (result2, cnt2, newState2) = head $ runRegModule matcher input (startPos + cnt1) newState1
 (result3, cnt3, newState3) = head $ runRegModule matcher input (startPos + cnt1 + cnt2) newState2
 in [(result1, cnt1, newState1), (result2, cnt2, newState2), (result3, cnt3, newState3)]

then it should work correctly. Note that it STILL will return the output you claimed was wrong:

ghci> runRegModuleThrice (regEX (RClass False (S.singleton 'a'))) "aaa" 0 []
[((),1,[]),((),1,[]),((),1,[])]

but this output is actually correct, as explained above.

It'll crash on an input of "aa":

λ> runRegModuleThrice (regEX (RClass False (S.singleton 'a'))) "aa" 0 []
[((),1,[]),((),1,[]),(*** Exception: Prelude.head: empty list
CallStack (from HasCallStack):
 error, called at libraries/base/GHC/List.hs:1646:3 in base:GHC.List
 errorEmptyList, called at libraries/base/GHC/List.hs:85:11 in base:GHC.List
 badHead, called at libraries/base/GHC/List.hs:81:28 in base:GHC.List
 head, called at Collect.hs:54:36 in main:Collect

but this is correct, too. Your runRegModuleThrice is written to assume that each run of the matcher succeeds and returns at least one match. When a run fails (like the third match failing here), it tries to take the head of an empty list and crashes, as designed.

I think what you were SUPPOSED to do here was implement runRegModuleThrice using the monad's implementation of >>=, not reimplement it from scratch, so maybe something more like:

runRegModuleThrice' :: RegModule d a -> String -> Int -> d -> [(a, Int, d)]
runRegModuleThrice' matcher = runRegModule (matcher >> matcher >> matcher)

which uses the monad bind (i.e., the definition of >>=) via the >> operator.

This does not return a "history" of the three matches. Instead, it returns the result of matching on the composite monadic match. This should return the a value of the final match, which is just () for your test case. It should also return the total number of character matched by the three matchers that are part of the composite match, which is the number 3, i.e., three matches of 1 character each.

And, in fact, it does just that:

λ> runRegModuleThrice' (regEX (RClass False (S.singleton 'a'))) "aaa" 0 []
[((),3,[])]

It also works correctly on strings that don't match without crashing:

λ> runRegModuleThrice' (regEX (RClass False (S.singleton 'a'))) "aa" 0 []
[]
λ> runRegModuleThrice' (regEX (RClass False (S.singleton 'a'))) "a" 0 []
[]
λ> runRegModuleThrice' (regEX (RClass False (S.singleton 'a'))) "bbb" 0 []
[]

Thank you for your very thourough response. It is much appreciated. I think you are correct in your assessments and interpretations.
One question. How come your runRegModuleThrice' takes 4 arguments like mine in the type signature, but in the actual function is only supplied with 1 argument? Is this currying? And if so, I thought currying would imply that you wouldn't need all the 4 arguments in the type signature, but just use whatever arguments the runRegModule would take within the function, based on currying.
Yes, it boils down to currying. One way of looking at it is that the type a -> b -> c -> d -> e is the same as the type a -> (b -> c -> d -> e). In words, a four argument function is the same as a one argument function that returns a three argument function. Without any change to the type signature, my definition for this function can either take four arguments and return a final e, or it can take one argument and return a three argument function (that returns e). In my definition, I used runRegModule in the body to create the three argument function I wanted.
I have actually posted a new thread with some pacularities about your runRegModuleThrice' that eludes me with regards to the discard function: stackoverflow.com/questions/76694011/…

CollectivesTM on Stack Overflow

Running Monad multiple times in haskell and updating state

2 Answers 2

4 Comments

3 Comments

Your Answer

Sign up or log in

Post as a guest

Post as a guest

Linked

Hot Network Questions

CollectivesTM on Stack Overflow

2 Answers 2

4 Comments

3 Comments

Your Answer

Sign up or log in

Post as a guest

Post as a guest

Linked

Related