A beginner's brainfuck interpreter in Haskell

Question 1

I am rather new to haskell, and could use some feedback on my code and the decisions I had to make. In my previous project, I made a JSON parser, but relied heavily on guidance from a university course.

This project is a brainfuck interpreter, for which I used nothing but the wiki article on brainfuck (i.e., no foreign design or code, apart from whileM). It was eye-opening in some regards, because I finally used and designed monadic operations by myself. But I am sure there are many more eyes on which I am blind, so please point out where I go wrong or miss more elegant solutions.

This implementation is lacking any IO, because I wanted to create a simple, baseline implementation of stateful computation. The goal is to afterwards use StateT instead of State and combine that with IO, so that I properly learn to use monad transformers.

In particular, I am unsure about the following points:

Data Model

I made a type Tape with a pointer which can be shifted and read from / written to. I use this type for both the data tape and the brainfuck program itself, the instruction tape. The functionality of the two tapes overlap, but not completely. For example, the operation "increment byte at pointer" is only needed for the data tape, while "seek next matching bracket" is only needed for the instruction tape (Lines 1-100).

The question is: Is it okay to make one type for which I implement ALL the functionality, or am I better off splitting it into two types and duplicating the code for the shared instructions (most are shared)? Is there a idiomatic way to have types "inherit" from other types?

Turning it into a State Monad

In lines 149-188, I implement the eight brainfuck operations as state-transforming operations. Even though they do quite different things, I think I need them to have the same type because of the chooseAction function (Line 232): Between different steps of the brainfuck program, the next instruction has to be read from the instruction tape and turned into an actual executable statetransformer. Because this chooseAction function needs to return values of one and the same type, all my monadic operations must have the same type... Right? How is that best achieved?

In my case, I decided to create a type Effect, which currently has two constructors, but which I can easily extend if I create statetransformers with novel effects. Most statetransformers (>, <, +, - etc) have no return value to speak of, so they return None :: Effect. But the sOutp statetransformer, which corresponds to brainfuck's '.', outputs a byte. This is where IO is going to come in eventually, but for the moment, I just want this byte to be captured by the Effect type and return it to an effect-collecting monadic function whileM :: Monad => m Bool -> m a -> m [a].

Apart from using the IO monad here and turning State into StateT, is there a better way to do what I want here? And do all my 8 monadic brainfuck operations therefore need to have the same type, even though they do such different things?

3)

Lines 150-188 seem a bit verbose, because I seem to be doing the same thing over and over. Am I missing some elegant way here or is this simply the tedious part of the implementation?

Thanks for any inputs!

For line numbers, see this file: https://github.com/el-micha/bf/blob/master/bf.hs

The same code is reproduced below. To run, try:

runTM testtm and runTM testtm2

import Control.Monad
import Data.Char (ord, chr)
import Data.Word (Word8)
-- Implementation without IO. Basecase for future IO extension with MonadTransformers. 
-- ============================================================================
-- Data types for data and instruction tapes and their operations
-- ============================================================================
-- keep track of tape length with tuple (pointer position, tape length so far)
type Bound = (Int, Int)
leftB :: Bound -> Bound
leftB (x, y) = (x-1, y)
rightB :: Bound -> Bound
rightB (x, y) = (x+1, max y (x+1))
-- a type for both the data tape and the instruction tape. 
-- pointer is head of first list. left of pointer is field1[1], right of pointer is field2[0]
-- [1 2 3 4 5 6]
-- ^
-- corresponds to Tape [3, 2, 1] [4, 5, 6] (2, 5)
type Byte = Word8
data Tape = Tape [Byte] [Byte] Bound
zeroes = [0 | _ <- [1..]]
emptyTape :: Tape
emptyTape = Tape [0] zeroes ((0, 0) :: Bound)
initTape :: [Byte] -> Tape
initTape (x:xs) = Tape [x] (xs++zeroes) ((0, length xs) :: Bound)
-- shift pointer by one
left :: Tape -> Tape
left (Tape [] _ _) = error "Cannot shift tape to left: Is at origin. Bad initialization."
left (Tape [x] _ _) = error "Cannot shift tape to left: Is at leftmost cell."
left (Tape (x:xs) ys b) = Tape xs (x:ys) (leftB b)
right :: Tape -> Tape
right (Tape xs (y:ys) b) = Tape (y:xs) ys (rightB b)
ptrPos (Tape xs ys b) = fst b
tapeLength (Tape xs ys b) = snd b
-- read from pointer position
readTape :: Tape -> Byte
readTape (Tape (ptr:xs) ys b) = ptr
-- is byte at ptr 0?
isZero :: Tape -> Bool
isZero = (==0) . readTape
isChar :: Char -> Tape -> Bool
isChar c = (==c) . chr . fromEnum . readTape
-- write to pointer position
writeTape :: Byte -> Tape -> Tape
writeTape n (Tape (ptr:xs) ys b) = Tape (n:xs) ys b
--increment, decrement the byte at ptr
increment :: Tape -> Tape
increment (Tape (ptr:xs) y b) = Tape ((ptr+1):xs) y b
decrement :: Tape -> Tape
decrement (Tape (ptr:xs) y b) = Tape ((ptr-1):xs) y b
inc = increment
dec = decrement
instance Show Tape where
 show (Tape (ptr:xs) ys b) = show (reverse xs) ++ " " ++ show ptr ++ " " ++ show (take (snd b - fst b) ys)
-- for the instruction tape
show' (Tape (x:xx) yy b) = reverse xs ++ " " ++ [ptr] ++ " " ++ take (snd b - fst b) ys
 where ptr = f x
 xs = map f xx
 ys = map f yy
 f = chr . fromEnum
-- ============================================================================
-- Instruction Tape: reuse tape but add some functions
-- find matching ], assuming program is syntactically correct, i.e., there IS a matching ] AND the tape points to a [ 
-- same with [ and left, so parametrized. result points to the bracket, so this needs a right/left shift afterwards, like every other instr.
seekAny dir charMatch charOther t = go 0 (dir t)
 where go n t
 | n == 0 && isChar charMatch t = t
 | isChar charMatch t = go (n - 1) (dir t)
 | isChar charOther t = go (n + 1) (dir t)
 | otherwise = go n (dir t)
seekRight :: Tape -> Tape
seekRight = seekAny right ']' '['
seekLeft :: Tape -> Tape
seekLeft = seekAny left '[' ']'
-- init for instruction tape
initStringTape :: [Char] -> Tape
initStringTape = initTape . map (toEnum . ord)
-- the whole brainfuck program / machine is a turingmachine, where both tapes are in a state which can change with every instr.
data TM = TM {dataTape :: Tape, insTape :: Tape}
instance Show TM where
 show (TM d i) = "\n" ++ show d ++ "\n" ++ show' i
testtm = TM (initTape [2,7]) (initStringTape "[->+<]")
testtm2 = TM emptyTape (initStringTape "++++++++[>++++[>++>+++>+++>+<<<<-]>+>+>->>+[<]<-]>>.>---.+++++++..+++.>>.<-.<.+++.------.--------.>>+.>++.")
-- ============================================================================
-- Put the data into a stateful context using State (-transformers) 
-- ============================================================================
-- TM state transformers
newtype State s a = State {runState :: s -> (a, s)}
-- State data constructor proxy
state :: (s -> (a, s)) -> State s a
state = State
-- canonical stuff
instance Functor (State s) where
 fmap = liftM
instance Applicative (State s) where
 pure = return
 (<*>) = ap
instance Monad (State s) where
 return x = state (\s -> (x, s)) 
 p >>= k = state $ \ s0 ->
 let (x, s1) = runState p s0
 in runState (k x) s1 
-- ============================================================================
-- The brainfuck operations are stateful
-- ============================================================================
-- let the monadic functions return effects. can be extended as necessary
data Effect = None | Result Byte
isNone None = True
isNone _ = False
-- these seem a bit verbose.. is this necessary? is there a better way?
-- shift data tape
fRight :: TM -> (Effect, TM)
fRight (TM d i) = (None , TM (right d) i)
sRight = state fRight
fLeft :: TM -> (Effect, TM)
fLeft (TM d i) = (None, TM (left d) i)
sLeft = state fLeft
-- increment byte at ptr on data tape
fInc :: TM -> (Effect, TM)
fInc (TM d i) = (None, TM (inc d) i)
sInc = state fInc
fDec :: TM -> (Effect, TM)
fDec (TM d i) = (None, TM (dec d) i)
sDec = state fDec
-- read or write byte from/to data tape
fOutp :: TM -> (Effect, TM)
fOutp tm = (Result (readTape $ dataTape tm), tm)
sOutp = state fOutp
fInp :: Byte -> TM -> (Effect, TM)
fInp byte (TM d i) = (None, TM (writeTape byte d) i)
sInp byte = state (fInp byte)
-- seek the next [ or the previous ]
fFwd :: TM -> (Effect, TM)
fFwd (TM d i)
 | isZero d = (None, TM d (seekRight i))
 | otherwise = (None, (TM d i))
sFwd = state fFwd
fBwd :: TM -> (Effect, TM)
fBwd (TM d i)
 | not $ isZero d = (None, TM d (seekLeft i))
 | otherwise = (None, (TM d i))
sBwd = state fBwd
-- ============================================================================
--read instruction from instr tape
fReadInstr :: TM -> (Byte, TM)
fReadInstr tm = (readTape $ insTape tm, tm)
sReadInstr = state fReadInstr
--next instruction: shift instr tape 
fNext :: TM -> ((), TM)
fNext (TM d i) = ((), TM d (right i))
sNext = state fNext
-- check if instruction tape is at the end
fCheck :: TM -> (Bool, TM)
fCheck (TM d i) = ((check i), TM d i)
 where check tape = (readTape tape) == 0
sCheck = state fCheck
sCheckNot = do 
 res <- sCheck
 return (not res)
-- ============================================================================
-- Putting it together: Run brainfuck turing machine step by step, 
-- collecting potential results (Effect)
-- ============================================================================
-- from Control.Monad.Loops
whileM :: Monad m => m Bool -> m a -> m [a]
whileM = whileM'
whileM' :: (Monad m, MonadPlus f) => m Bool -> m a -> m (f a)
whileM' p f = go
 where go = do
 x <- p
 if x
 then do
 x <- f
 xs <- go
 return (return x `mplus` xs)
 else return mzero
-- from instruction character, resolve which action to take next.
chooseAction :: Enum a => a -> State TM Effect
chooseAction instr = 
 case chr . fromEnum $ instr of 
 '>' -> sRight
 '<' -> sLeft
 '+' -> sInc
 '-' -> sDec
 '.' -> sOutp --sOutp
 ',' -> return None --sInp -- to be combined with IO...
 '[' -> sFwd
 ']' -> sBwd
 _ -> return None
-- run the current step of the brainfuck program
stepTM :: State TM Effect
stepTM = do
 instr <- sReadInstr
 let next = chooseAction instr -- get statetransformer encoded by this char
 res <- next -- run that statetransformer
 sNext -- shift instruction tape pointer to right
 return res
-- combine a list of possibly empty effects into a string
combineEffects :: [Effect] -> [Char]
combineEffects ms = go "" ms 
 where
 go str [] = str
 go str ((None):xs) = go str xs
 go str ((Result byte):xs) = go (str ++ [(chr . fromEnum $ byte)]) xs
-- run a tm and print its result and states
runTM tm = (combineEffects (fst tup), (snd tup)) 
 where
 tup = runState (whileM sCheckNot stepTM) tm

Question 2

About your concerns

This project is a brainfuck interpreter, for which I used nothing but the wiki article on brainfuck (i.e., no foreign design or code, apart from whileM). It was eye-opening in some regards, because I finally used and designed monadic operations by myself. But I am sure there are many more eyes on which I am blind, so please point out where I go wrong or miss more elegant solutions.

Well done. At first, I wondered why you didn't reuse State or StateT in your program, but you used the canonical definition. For a real-world problem, I'd strongly suggest you to use Control.Monad.State though (maybe after your own StateT implementation).

The question is: Is it okay to make one type for which I implement ALL the functionality, or am I better off splitting it into two types and duplicating the code for the shared instructions (most are shared)? Is there a idiomatic way to have types "inherit" from other types?

You could use a typeclass for the common operations and a newtype. However, there's another issue with your Tape, which we'll tackle at the end.

Because this chooseAction function needs to return values of one and the same type, all my monadic operations must have the same type... Right? How is that best achieved?

Yes, in this case. However, you could also discard the "effects" of non-effectful operations, e.g.

-- shift data tape
fRight :: TM -> ((), TM)
fRight (TM d i) = (() , TM (right d) i)
sRight = state fRight
noEffect :: State TM Effect
noEffect = state $ \x -> (None, x)
...
chooseAction :: Enum a => a -> State TM Effect
chooseAction instr = 
 case chr . fromEnum $ instr of 
 '>' -> sRight >> noEffect
 '<' -> sLeft >> noEffect
 ...
 '.' -> fOutp
 ...

That way, your original functions still have a proper type that signals whether they'll have a result, and chooseAction can adjust the functions.

Lines 150-188 seem a bit verbose, because I seem to be doing the same thing over and over. Am I missing some elegant way here or is this simply the tedious part of the implementation?

Add a helper that fixes this issue for the first four functions:

onTape :: (TM -> TM) -> TM -> (Effect, TM)
onTape f (TM d i) = TM (f d) i
sRight = state $ onTape right
sLeft state $ onTape left
sInc = state $ onTape inc
sInc = state $ onTape dec

About the implementation

Let's tackle the rist issue: consistency.

Consistency is key

When I first started reading the code, I was delighted, as there were type signatures. However, that delight came to a fast stop on zeroes.

While zeroes itself doesn't need a type, ptrPos and tapeLength could use one. Similar, all stateful functions (sLeft, sRight, ...) don't have a type and almost dissappear below their non state variants. That's hard to maintain.

Speaking about consistency: where left explicitly ignores its unused arguments via _, ptrPos and tapeLength don't. None of this is bad code per se, but it challenges a reader to change their expectations while reading the code: can I expect all named bindings to be used? Can I grep for :: to see all top-level functions?

Also, without type signatures, the functions might have a more general type than you expect, which can become a problem when types mismatch, e.g.

seekAny :: (a -> a) -> Char -> Char -> a -> a

The tapes and the Turing machine

Ah, the tapes. First of all, I like your Tape type. I would prefer a Tape with a head for easier modification, e.g.

data Tape = [Byte] Byte [Byte]

but that's personal preference. However, I don't believe that TM has the correct type. While a Tape might be the appropriate type for an abstract Turing machine's instruction set, we're working with an (somehwat) already concrete one: our actual machine that runs our Haskell program. As such, there is a better data structure for our Brainfuck program.

If you really want to keep both Tapes, disregard the rest of this section.

Ah, great. You want to continue. Well, here's my issue with two Tapes: we never lift the instructions from a Byte into the typed Haskell world. If we think about Brainfucks instructions in terms of a data type, we would get something along the following if we tried the naive approach:

data Instruction
 = Increase
 | Decrease
 | MoveRight
 | MoveLeft
 | Out
 | In
 | BeginLoop
 | EndLoop
type Program = [Instruction]

However, the handling of loops gets a lot easier if we instead use

data Instruction
 = Increase
 | Decrease
 | MoveRight
 | MoveLeft
 | Out
 | In
 | Loop Program
type Program = [Instruction]
parse :: String -> Either ParserError Program

You can probably guess how this eases several parts of the code, e.g.

run :: Program -> State Tape [Effect]
run = fmap concat . mapM chooseAction
chooseAction :: Instruction -> State Tape [Effect]
chooseAction instr = case instr of
 MoveLeft -> sLeft
 MoveRight -> sLeft
 ...
 Loop prog -> do
 d <- get
 if isZero d 
 then return [] 
 else 
 (++) <$> run prog <*> chooseAction instr

Since our Program is now a simple list, we can get rid of whileM and whileM'.

Furthermore, if we add set :: s -> State s (), get :: State s s and modify :: (s -> s) -> State s () to your code we get rid of mosts s* functions:

chooseAction :: Instruction -> State Tape [Effect]
chooseAction instr = case instr of
 MoveLeft -> modify left >> return []
 MoveRight -> modify right >> return []
 Increment -> modify increment >> return []
 Decrement -> modify decrement >> return []
 Out -> get >>= return (:[])
 ...

For more information about this kind of program, see my review on another brainfuck implementation.

About input

I strongly advice against using IO for input at first. You want to be able to test your machine without user input. Instead, use a String or some abstraction to get around the IO limitation at first, e.g.

run :: String -> Program -> ...

Other remarks

Here are some other observations I had when I read the code:

zeroes is repeat 0

writeTape, increment and decrement can be defined via

onHead :: (Byte -> Byte) -> Tape -> Tape
onHead f (Tape (ptr:xs) y b) = Tape ((f ptr):xs) y b

a type synonym for TM -> (Effect, TM) can reduce noise
the usual state modifiers set :: s -> State s (), get :: State s s and modify :: (s -> s) -> State s () can provide a way to write your Stateful variants immediately, e.g.
```
sOutp = (Result . readTape) <$> get
sInp b = set b
sRight = modify right
sLeft = modify left
sInc = modify increment
```
See also the example above.
some function names might be misleading. isZero sounds like a function that takes a number, but it takes a tape. isAtZero might be more appropriate

Now, keep in mind that there is a lot of well-written code in your program, so don't be disheartened by this wall of text.

Zeta Zeta 19.6k2 gold badges57 silver badges90 bronze badges · Accepted Answer · 2021-04-24 17:19:51Z

About your concerns

This project is a brainfuck interpreter, for which I used nothing but the wiki article on brainfuck (i.e., no foreign design or code, apart from whileM). It was eye-opening in some regards, because I finally used and designed monadic operations by myself. But I am sure there are many more eyes on which I am blind, so please point out where I go wrong or miss more elegant solutions.

Well done. At first, I wondered why you didn't reuse State or StateT in your program, but you used the canonical definition. For a real-world problem, I'd strongly suggest you to use Control.Monad.State though (maybe after your own StateT implementation).

The question is: Is it okay to make one type for which I implement ALL the functionality, or am I better off splitting it into two types and duplicating the code for the shared instructions (most are shared)? Is there a idiomatic way to have types "inherit" from other types?

You could use a typeclass for the common operations and a newtype. However, there's another issue with your Tape, which we'll tackle at the end.

Because this chooseAction function needs to return values of one and the same type, all my monadic operations must have the same type... Right? How is that best achieved?

Yes, in this case. However, you could also discard the "effects" of non-effectful operations, e.g.

-- shift data tape
fRight :: TM -> ((), TM)
fRight (TM d i) = (() , TM (right d) i)
sRight = state fRight
noEffect :: State TM Effect
noEffect = state $ \x -> (None, x)
...
chooseAction :: Enum a => a -> State TM Effect
chooseAction instr = 
 case chr . fromEnum $ instr of 
 '>' -> sRight >> noEffect
 '<' -> sLeft >> noEffect
 ...
 '.' -> fOutp
 ...

That way, your original functions still have a proper type that signals whether they'll have a result, and chooseAction can adjust the functions.

Lines 150-188 seem a bit verbose, because I seem to be doing the same thing over and over. Am I missing some elegant way here or is this simply the tedious part of the implementation?

Add a helper that fixes this issue for the first four functions:

onTape :: (TM -> TM) -> TM -> (Effect, TM)
onTape f (TM d i) = TM (f d) i
sRight = state $ onTape right
sLeft state $ onTape left
sInc = state $ onTape inc
sInc = state $ onTape dec

About the implementation

Let's tackle the rist issue: consistency.

Consistency is key

When I first started reading the code, I was delighted, as there were type signatures. However, that delight came to a fast stop on zeroes.

While zeroes itself doesn't need a type, ptrPos and tapeLength could use one. Similar, all stateful functions (sLeft, sRight, ...) don't have a type and almost dissappear below their non state variants. That's hard to maintain.

Speaking about consistency: where left explicitly ignores its unused arguments via _, ptrPos and tapeLength don't. None of this is bad code per se, but it challenges a reader to change their expectations while reading the code: can I expect all named bindings to be used? Can I grep for :: to see all top-level functions?

Also, without type signatures, the functions might have a more general type than you expect, which can become a problem when types mismatch, e.g.

seekAny :: (a -> a) -> Char -> Char -> a -> a

The tapes and the Turing machine

Ah, the tapes. First of all, I like your Tape type. I would prefer a Tape with a head for easier modification, e.g.

data Tape = [Byte] Byte [Byte]

but that's personal preference. However, I don't believe that TM has the correct type. While a Tape might be the appropriate type for an abstract Turing machine's instruction set, we're working with an (somehwat) already concrete one: our actual machine that runs our Haskell program. As such, there is a better data structure for our Brainfuck program.

If you really want to keep both Tapes, disregard the rest of this section.

Ah, great. You want to continue. Well, here's my issue with two Tapes: we never lift the instructions from a Byte into the typed Haskell world. If we think about Brainfucks instructions in terms of a data type, we would get something along the following if we tried the naive approach:

data Instruction
 = Increase
 | Decrease
 | MoveRight
 | MoveLeft
 | Out
 | In
 | BeginLoop
 | EndLoop
type Program = [Instruction]

However, the handling of loops gets a lot easier if we instead use

data Instruction
 = Increase
 | Decrease
 | MoveRight
 | MoveLeft
 | Out
 | In
 | Loop Program
type Program = [Instruction]
parse :: String -> Either ParserError Program

You can probably guess how this eases several parts of the code, e.g.

run :: Program -> State Tape [Effect]
run = fmap concat . mapM chooseAction
chooseAction :: Instruction -> State Tape [Effect]
chooseAction instr = case instr of
 MoveLeft -> sLeft
 MoveRight -> sLeft
 ...
 Loop prog -> do
 d <- get
 if isZero d 
 then return [] 
 else 
 (++) <$> run prog <*> chooseAction instr

Since our Program is now a simple list, we can get rid of whileM and whileM'.

Furthermore, if we add set :: s -> State s (), get :: State s s and modify :: (s -> s) -> State s () to your code we get rid of mosts s* functions:

chooseAction :: Instruction -> State Tape [Effect]
chooseAction instr = case instr of
 MoveLeft -> modify left >> return []
 MoveRight -> modify right >> return []
 Increment -> modify increment >> return []
 Decrement -> modify decrement >> return []
 Out -> get >>= return (:[])
 ...

For more information about this kind of program, see my review on another brainfuck implementation.

About input

I strongly advice against using IO for input at first. You want to be able to test your machine without user input. Instead, use a String or some abstraction to get around the IO limitation at first, e.g.

run :: String -> Program -> ...

Other remarks

Here are some other observations I had when I read the code:

zeroes is repeat 0

writeTape, increment and decrement can be defined via

onHead :: (Byte -> Byte) -> Tape -> Tape
onHead f (Tape (ptr:xs) y b) = Tape ((f ptr):xs) y b

a type synonym for TM -> (Effect, TM) can reduce noise
the usual state modifiers set :: s -> State s (), get :: State s s and modify :: (s -> s) -> State s () can provide a way to write your Stateful variants immediately, e.g.
```
sOutp = (Result . readTape) <$> get
sInp b = set b
sRight = modify right
sLeft = modify left
sInc = modify increment
```
See also the example above.
some function names might be misleading. isZero sounds like a function that takes a number, but it takes a tape. isAtZero might be more appropriate

Now, keep in mind that there is a lot of well-written code in your program, so don't be disheartened by this wall of text.

Stack Exchange Network

A beginner's brainfuck interpreter in Haskell

1 Answer 1

About your concerns

About the implementation

Consistency is key

The tapes and the Turing machine

About input

Other remarks

Your Answer

Sign up or log in

Post as a guest

Post as a guest

Linked

Hot Network Questions

A beginner's brainfuck interpreter in Haskell

1 Answer 1

About your concerns

About the implementation

Consistency is key

The tapes and the Turing machine

About input

Other remarks

Your Answer

Sign up or log in

Post as a guest

Post as a guest

Linked

Related

Hot Network Questions