I am rather new to haskell, and could use some feedback on my code and the decisions I had to make. In my previous project, I made a JSON parser, but relied heavily on guidance from a university course.
This project is a brainfuck interpreter, for which I used nothing but the wiki article on brainfuck (i.e., no foreign design or code, apart from whileM). It was eye-opening in some regards, because I finally used and designed monadic operations by myself. But I am sure there are many more eyes on which I am blind, so please point out where I go wrong or miss more elegant solutions.
This implementation is lacking any IO, because I wanted to create a simple, baseline implementation of stateful computation. The goal is to afterwards use StateT instead of State and combine that with IO, so that I properly learn to use monad transformers.
In particular, I am unsure about the following points:
- Data Model
I made a type Tape with a pointer which can be shifted and read from / written to. I use this type for both the data tape and the brainfuck program itself, the instruction tape. The functionality of the two tapes overlap, but not completely. For example, the operation "increment byte at pointer" is only needed for the data tape, while "seek next matching bracket" is only needed for the instruction tape (Lines 1-100).
The question is: Is it okay to make one type for which I implement ALL the functionality, or am I better off splitting it into two types and duplicating the code for the shared instructions (most are shared)? Is there a idiomatic way to have types "inherit" from other types?
- Turning it into a State Monad
In lines 149-188, I implement the eight brainfuck operations as state-transforming operations. Even though they do quite different things, I think I need them to have the same type because of the chooseAction function (Line 232): Between different steps of the brainfuck program, the next instruction has to be read from the instruction tape and turned into an actual executable statetransformer. Because this chooseAction function needs to return values of one and the same type, all my monadic operations must have the same type... Right? How is that best achieved?
In my case, I decided to create a type Effect, which currently has two constructors, but which I can easily extend if I create statetransformers with novel effects. Most statetransformers (>, <, +, - etc) have no return value to speak of, so they return None :: Effect. But the sOutp statetransformer, which corresponds to brainfuck's '.', outputs a byte. This is where IO is going to come in eventually, but for the moment, I just want this byte to be captured by the Effect type and return it to an effect-collecting monadic function whileM :: Monad => m Bool -> m a -> m [a].
Apart from using the IO monad here and turning State into StateT, is there a better way to do what I want here? And do all my 8 monadic brainfuck operations therefore need to have the same type, even though they do such different things?
3)
Lines 150-188 seem a bit verbose, because I seem to be doing the same thing over and over. Am I missing some elegant way here or is this simply the tedious part of the implementation?
Thanks for any inputs!
For line numbers, see this file: https://github.com/el-micha/bf/blob/master/bf.hs
The same code is reproduced below. To run, try:
runTM testtm
and runTM testtm2
import Control.Monad
import Data.Char (ord, chr)
import Data.Word (Word8)
-- Implementation without IO. Basecase for future IO extension with MonadTransformers.
-- ============================================================================
-- Data types for data and instruction tapes and their operations
-- ============================================================================
-- keep track of tape length with tuple (pointer position, tape length so far)
type Bound = (Int, Int)
leftB :: Bound -> Bound
leftB (x, y) = (x-1, y)
rightB :: Bound -> Bound
rightB (x, y) = (x+1, max y (x+1))
-- a type for both the data tape and the instruction tape.
-- pointer is head of first list. left of pointer is field1[1], right of pointer is field2[0]
-- [1 2 3 4 5 6]
-- ^
-- corresponds to Tape [3, 2, 1] [4, 5, 6] (2, 5)
type Byte = Word8
data Tape = Tape [Byte] [Byte] Bound
zeroes = [0 | _ <- [1..]]
emptyTape :: Tape
emptyTape = Tape [0] zeroes ((0, 0) :: Bound)
initTape :: [Byte] -> Tape
initTape (x:xs) = Tape [x] (xs++zeroes) ((0, length xs) :: Bound)
-- shift pointer by one
left :: Tape -> Tape
left (Tape [] _ _) = error "Cannot shift tape to left: Is at origin. Bad initialization."
left (Tape [x] _ _) = error "Cannot shift tape to left: Is at leftmost cell."
left (Tape (x:xs) ys b) = Tape xs (x:ys) (leftB b)
right :: Tape -> Tape
right (Tape xs (y:ys) b) = Tape (y:xs) ys (rightB b)
ptrPos (Tape xs ys b) = fst b
tapeLength (Tape xs ys b) = snd b
-- read from pointer position
readTape :: Tape -> Byte
readTape (Tape (ptr:xs) ys b) = ptr
-- is byte at ptr 0?
isZero :: Tape -> Bool
isZero = (==0) . readTape
isChar :: Char -> Tape -> Bool
isChar c = (==c) . chr . fromEnum . readTape
-- write to pointer position
writeTape :: Byte -> Tape -> Tape
writeTape n (Tape (ptr:xs) ys b) = Tape (n:xs) ys b
--increment, decrement the byte at ptr
increment :: Tape -> Tape
increment (Tape (ptr:xs) y b) = Tape ((ptr+1):xs) y b
decrement :: Tape -> Tape
decrement (Tape (ptr:xs) y b) = Tape ((ptr-1):xs) y b
inc = increment
dec = decrement
instance Show Tape where
show (Tape (ptr:xs) ys b) = show (reverse xs) ++ " " ++ show ptr ++ " " ++ show (take (snd b - fst b) ys)
-- for the instruction tape
show' (Tape (x:xx) yy b) = reverse xs ++ " " ++ [ptr] ++ " " ++ take (snd b - fst b) ys
where ptr = f x
xs = map f xx
ys = map f yy
f = chr . fromEnum
-- ============================================================================
-- Instruction Tape: reuse tape but add some functions
-- find matching ], assuming program is syntactically correct, i.e., there IS a matching ] AND the tape points to a [
-- same with [ and left, so parametrized. result points to the bracket, so this needs a right/left shift afterwards, like every other instr.
seekAny dir charMatch charOther t = go 0 (dir t)
where go n t
| n == 0 && isChar charMatch t = t
| isChar charMatch t = go (n - 1) (dir t)
| isChar charOther t = go (n + 1) (dir t)
| otherwise = go n (dir t)
seekRight :: Tape -> Tape
seekRight = seekAny right ']' '['
seekLeft :: Tape -> Tape
seekLeft = seekAny left '[' ']'
-- init for instruction tape
initStringTape :: [Char] -> Tape
initStringTape = initTape . map (toEnum . ord)
-- the whole brainfuck program / machine is a turingmachine, where both tapes are in a state which can change with every instr.
data TM = TM {dataTape :: Tape, insTape :: Tape}
instance Show TM where
show (TM d i) = "\n" ++ show d ++ "\n" ++ show' i
testtm = TM (initTape [2,7]) (initStringTape "[->+<]")
testtm2 = TM emptyTape (initStringTape "++++++++[>++++[>++>+++>+++>+<<<<-]>+>+>->>+[<]<-]>>.>---.+++++++..+++.>>.<-.<.+++.------.--------.>>+.>++.")
-- ============================================================================
-- Put the data into a stateful context using State (-transformers)
-- ============================================================================
-- TM state transformers
newtype State s a = State {runState :: s -> (a, s)}
-- State data constructor proxy
state :: (s -> (a, s)) -> State s a
state = State
-- canonical stuff
instance Functor (State s) where
fmap = liftM
instance Applicative (State s) where
pure = return
(<*>) = ap
instance Monad (State s) where
return x = state (\s -> (x, s))
p >>= k = state $ \ s0 ->
let (x, s1) = runState p s0
in runState (k x) s1
-- ============================================================================
-- The brainfuck operations are stateful
-- ============================================================================
-- let the monadic functions return effects. can be extended as necessary
data Effect = None | Result Byte
isNone None = True
isNone _ = False
-- these seem a bit verbose.. is this necessary? is there a better way?
-- shift data tape
fRight :: TM -> (Effect, TM)
fRight (TM d i) = (None , TM (right d) i)
sRight = state fRight
fLeft :: TM -> (Effect, TM)
fLeft (TM d i) = (None, TM (left d) i)
sLeft = state fLeft
-- increment byte at ptr on data tape
fInc :: TM -> (Effect, TM)
fInc (TM d i) = (None, TM (inc d) i)
sInc = state fInc
fDec :: TM -> (Effect, TM)
fDec (TM d i) = (None, TM (dec d) i)
sDec = state fDec
-- read or write byte from/to data tape
fOutp :: TM -> (Effect, TM)
fOutp tm = (Result (readTape $ dataTape tm), tm)
sOutp = state fOutp
fInp :: Byte -> TM -> (Effect, TM)
fInp byte (TM d i) = (None, TM (writeTape byte d) i)
sInp byte = state (fInp byte)
-- seek the next [ or the previous ]
fFwd :: TM -> (Effect, TM)
fFwd (TM d i)
| isZero d = (None, TM d (seekRight i))
| otherwise = (None, (TM d i))
sFwd = state fFwd
fBwd :: TM -> (Effect, TM)
fBwd (TM d i)
| not $ isZero d = (None, TM d (seekLeft i))
| otherwise = (None, (TM d i))
sBwd = state fBwd
-- ============================================================================
--read instruction from instr tape
fReadInstr :: TM -> (Byte, TM)
fReadInstr tm = (readTape $ insTape tm, tm)
sReadInstr = state fReadInstr
--next instruction: shift instr tape
fNext :: TM -> ((), TM)
fNext (TM d i) = ((), TM d (right i))
sNext = state fNext
-- check if instruction tape is at the end
fCheck :: TM -> (Bool, TM)
fCheck (TM d i) = ((check i), TM d i)
where check tape = (readTape tape) == 0
sCheck = state fCheck
sCheckNot = do
res <- sCheck
return (not res)
-- ============================================================================
-- Putting it together: Run brainfuck turing machine step by step,
-- collecting potential results (Effect)
-- ============================================================================
-- from Control.Monad.Loops
whileM :: Monad m => m Bool -> m a -> m [a]
whileM = whileM'
whileM' :: (Monad m, MonadPlus f) => m Bool -> m a -> m (f a)
whileM' p f = go
where go = do
x <- p
if x
then do
x <- f
xs <- go
return (return x `mplus` xs)
else return mzero
-- from instruction character, resolve which action to take next.
chooseAction :: Enum a => a -> State TM Effect
chooseAction instr =
case chr . fromEnum $ instr of
'>' -> sRight
'<' -> sLeft
'+' -> sInc
'-' -> sDec
'.' -> sOutp --sOutp
',' -> return None --sInp -- to be combined with IO...
'[' -> sFwd
']' -> sBwd
_ -> return None
-- run the current step of the brainfuck program
stepTM :: State TM Effect
stepTM = do
instr <- sReadInstr
let next = chooseAction instr -- get statetransformer encoded by this char
res <- next -- run that statetransformer
sNext -- shift instruction tape pointer to right
return res
-- combine a list of possibly empty effects into a string
combineEffects :: [Effect] -> [Char]
combineEffects ms = go "" ms
where
go str [] = str
go str ((None):xs) = go str xs
go str ((Result byte):xs) = go (str ++ [(chr . fromEnum $ byte)]) xs
-- run a tm and print its result and states
runTM tm = (combineEffects (fst tup), (snd tup))
where
tup = runState (whileM sCheckNot stepTM) tm
1 Answer 1
About your concerns
This project is a brainfuck interpreter, for which I used nothing but the wiki article on brainfuck (i.e., no foreign design or code, apart from
whileM
). It was eye-opening in some regards, because I finally used and designed monadic operations by myself. But I am sure there are many more eyes on which I am blind, so please point out where I go wrong or miss more elegant solutions.
Well done. At first, I wondered why you didn't reuse State
or StateT
in your program, but you used the canonical definition. For a real-world problem, I'd strongly suggest you to use Control.Monad.State
though (maybe after your own StateT
implementation).
The question is: Is it okay to make one type for which I implement ALL the functionality, or am I better off splitting it into two types and duplicating the code for the shared instructions (most are shared)? Is there a idiomatic way to have types "inherit" from other types?
You could use a typeclass for the common operations and a newtype
. However, there's another issue with your Tape
, which we'll tackle at the end.
Because this chooseAction function needs to return values of one and the same type, all my monadic operations must have the same type... Right? How is that best achieved?
Yes, in this case. However, you could also discard the "effects" of non-effectful operations, e.g.
-- shift data tape
fRight :: TM -> ((), TM)
fRight (TM d i) = (() , TM (right d) i)
sRight = state fRight
noEffect :: State TM Effect
noEffect = state $ \x -> (None, x)
...
chooseAction :: Enum a => a -> State TM Effect
chooseAction instr =
case chr . fromEnum $ instr of
'>' -> sRight >> noEffect
'<' -> sLeft >> noEffect
...
'.' -> fOutp
...
That way, your original functions still have a proper type that signals whether they'll have a result, and chooseAction
can adjust the functions.
Lines 150-188 seem a bit verbose, because I seem to be doing the same thing over and over. Am I missing some elegant way here or is this simply the tedious part of the implementation?
Add a helper that fixes this issue for the first four functions:
onTape :: (TM -> TM) -> TM -> (Effect, TM)
onTape f (TM d i) = TM (f d) i
sRight = state $ onTape right
sLeft state $ onTape left
sInc = state $ onTape inc
sInc = state $ onTape dec
About the implementation
Let's tackle the rist issue: consistency.
Consistency is key
When I first started reading the code, I was delighted, as there were type signatures. However, that delight came to a fast stop on zeroes
.
While zeroes
itself doesn't need a type, ptrPos
and tapeLength
could use one. Similar, all stateful functions (sLeft
, sRight
, ...) don't have a type and almost dissappear below their non state
variants. That's hard to maintain.
Speaking about consistency: where left
explicitly ignores its unused arguments via _
, ptrPos
and tapeLength
don't. None of this is bad code per se, but it challenges a reader to change their expectations while reading the code: can I expect all named bindings to be used? Can I grep
for ::
to see all top-level functions?
Also, without type signatures, the functions might have a more general type than you expect, which can become a problem when types mismatch, e.g.
seekAny :: (a -> a) -> Char -> Char -> a -> a
The tapes and the Turing machine
Ah, the tapes. First of all, I like your Tape
type. I would prefer a Tape
with a head for easier modification, e.g.
data Tape = [Byte] Byte [Byte]
but that's personal preference. However, I don't believe that TM
has the correct type. While a Tape
might be the appropriate type for an abstract Turing machine's instruction set, we're working with an (somehwat) already concrete one: our actual machine that runs our Haskell program. As such, there is a better data structure for our Brainfuck program.
If you really want to keep both Tape
s, disregard the rest of this section.
Ah, great. You want to continue. Well, here's my issue with two Tape
s: we never lift the instructions from a Byte
into the typed Haskell world. If we think about Brainfucks instructions in terms of a data type, we would get something along the following if we tried the naive approach:
data Instruction
= Increase
| Decrease
| MoveRight
| MoveLeft
| Out
| In
| BeginLoop
| EndLoop
type Program = [Instruction]
However, the handling of loops gets a lot easier if we instead use
data Instruction
= Increase
| Decrease
| MoveRight
| MoveLeft
| Out
| In
| Loop Program
type Program = [Instruction]
parse :: String -> Either ParserError Program
You can probably guess how this eases several parts of the code, e.g.
run :: Program -> State Tape [Effect]
run = fmap concat . mapM chooseAction
chooseAction :: Instruction -> State Tape [Effect]
chooseAction instr = case instr of
MoveLeft -> sLeft
MoveRight -> sLeft
...
Loop prog -> do
d <- get
if isZero d
then return []
else
(++) <$> run prog <*> chooseAction instr
Since our Program
is now a simple list, we can get rid of whileM
and whileM'
.
Furthermore, if we add set :: s -> State s ()
, get :: State s s
and modify :: (s -> s) -> State s ()
to your code we get rid of mosts s*
functions:
chooseAction :: Instruction -> State Tape [Effect]
chooseAction instr = case instr of
MoveLeft -> modify left >> return []
MoveRight -> modify right >> return []
Increment -> modify increment >> return []
Decrement -> modify decrement >> return []
Out -> get >>= return (:[])
...
For more information about this kind of program, see my review on another brainfuck implementation.
About input
I strongly advice against using IO
for input at first. You want to be able to test your machine without user input. Instead, use a String
or some abstraction to get around the IO
limitation at first, e.g.
run :: String -> Program -> ...
Other remarks
Here are some other observations I had when I read the code:
zeroes
isrepeat 0
writeTape
,increment
anddecrement
can be defined viaonHead :: (Byte -> Byte) -> Tape -> Tape onHead f (Tape (ptr:xs) y b) = Tape ((f ptr):xs) y b
- a type synonym for
TM -> (Effect, TM)
can reduce noise - the usual state modifiers
set :: s -> State s ()
,get :: State s s
andmodify :: (s -> s) -> State s ()
can provide a way to write yourState
ful variants immediately, e.g.
See also the example above.sOutp = (Result . readTape) <$> get sInp b = set b sRight = modify right sLeft = modify left sInc = modify increment
- some function names might be misleading.
isZero
sounds like a function that takes a number, but it takes a tape.isAtZero
might be more appropriate
Now, keep in mind that there is a lot of well-written code in your program, so don't be disheartened by this wall of text.
Explore related questions
See similar questions with these tags.