I just started learning Haskell and this is my first big project (ie not factorials, fibonacci or graphers). This is kind of a gift for somebody so the language is a bit different. The program works, but I felt that some parts of the code could be improved, specifically:
- Is Vector the best choice for representing current memory state in the program state record,
Program
? - I executed the program by splitting the source code into a list of words (the code state), then recursively going through the list and
parse
ing it on each loop. Is there a better way to do it? Tokenization maybe? - I created loops by putting the current code state into a stack named
loopback
, and using this to "go back in time" when needed. Is there a better way to do it? - I just felt the way I handled loops are inelegant; such as
toChaCharmanderChar'
. - Did I use guards and pattern matching correctly?
- Is it confusing in general?
I consider myself a beginner-intermediate programmer, having just learnt monads.
Documentation
Char: Previous 1
Char-char: Next 1
Cha: Input
Charmander: Output
Cha-cha: Minus 1
Charmander-charmander: Plus 1
Charmander-char: Loop start; if data at DP is 0, jump to corresponding Cha-charmander-char.
Cha-charmander-char: Loop end; if data at DP is not 0, jump to corresponding Charmander-char.
DP: Data Pointer, starts at 0. Points to an integer in memory.
Memory: A tape of 128 integers.
Main.hs
module Main
(
main
) where
import Data.Vector (Vector, replicate, (//), (!), accum)
import Data.Char (ord, chr, toLower)
import System.Environment (getArgs)
import System.IO (IOMode(ReadMode), BufferMode(NoBuffering), openFile, hGetContents, hSetBuffering, stdin)
-- Program state
type DP = Integer
type Memory = Vector Integer
type Code = [String]
data Program = Program {
dp :: DP,
memory :: Memory,
code :: Code,
loopback :: [Code]
} deriving Show
initial :: Code -> Program
initial code = Program {
dp = 0,
memory = Data.Vector.replicate 128 0,
code = code,
loopback = []
}
-- Microcommands
next :: Program -> Program
next program = program {
code = tail $ code program
}
move :: Integer -> Program -> Program
move num program = program {
dp = dp program + num
}
change :: Integer -> Program -> Program
change num program = program {
memory = accum (+) (memory program) [(fromInteger $ dp program, num)]
}
toChaCharmanderChar' :: Integer -> Program -> Program
toChaCharmanderChar' depth program
| depth == 0 && current == "cha-charmander-char" = program
| current == "cha-charmander-char" = toChaCharmanderChar' (depth - 1) $ next program
| current == "charmander-char" = toChaCharmanderChar' (depth + 1) $ next program
| otherwise = toChaCharmanderChar' depth $ next program
where current = head $ code program
toChaCharmanderChar :: Program -> Program
toChaCharmanderChar = toChaCharmanderChar' 0
-- Commands
char :: Program -> IO Program
char program = return $ move (-1) program
charChar :: Program -> IO Program
charChar program = return $ move 1 program
chaCha :: Program -> IO Program
chaCha program = return $ change (-1) program
charmanderCharmander :: Program -> IO Program
charmanderCharmander program = return $ change 1 program
cha :: Program -> IO Program
cha program = do
input <- getChar
return program {
memory = (memory program) // [(fromInteger $ dp program, toInteger $ ord input)]
}
charmander :: Program -> IO Program
charmander program = do
putStr [chr $ fromInteger $ (memory program) ! (fromInteger $ dp program)]
return program
charmanderChar :: Program -> IO Program
charmanderChar program
| num == 0 = return $ toChaCharmanderChar $ next program
| num /= 0 = return program {
loopback = code program : loopback program
}
where num = memory program ! (fromInteger $ dp program)
chaCharmanderChar :: Program -> IO Program
chaCharmanderChar program
| num == 0 = return program {
loopback = tail $ loopback program
}
| num /= 0 = return program {
code = head $ loopback program
}
where num = memory program ! (fromInteger $ dp program)
unknown :: Program -> IO Program
unknown program = return program
-- Parser
parse :: String -> Program -> IO Program
parse "char" = char
parse "char-char" = charChar
parse "cha" = cha
parse "charmander" = charmander
parse "cha-cha" = chaCha
parse "charmander-charmander" = charmanderCharmander
parse "charmander-char" = charmanderChar
parse "cha-charmander-char" = chaCharmanderChar
parse _ = unknown
-- Interpreter
interpret' :: Program -> IO ()
interpret' program
| code' == [] = do
putStrLn "\nCore dump: "
print program
putStrLn "\nCha char charmander (0)."
| otherwise = do
newProgram <- parse (head code') program
interpret' $ next newProgram
where code' = code program
interpret :: String -> IO ()
interpret program = interpret' $ initial $ words $ fmap toLower program
-- Main
main :: IO ()
main = do
hSetBuffering stdin NoBuffering
args <- getArgs
if null args
then
putStrLn "Cha charmander (-1)."
else do
handle <- openFile (head args) ReadMode
contents <- hGetContents handle
interpret contents
charmander.cabal
-- Initial charmander.cabal generated by cabal init. For further
-- documentation, see http://haskell.org/cabal/users-guide/
name: charmander
version: 0.1.0.0
synopsis: Charmander-char!
-- description:
-- license:
license-file: LICENSE
author: Ignis Incendio
maintainer: [email protected]
-- copyright:
category: Language
build-type: Simple
cabal-version: >=1.8
executable charmander
main-is: Main.hs
-- other-modules:
build-depends: base ==4.6.*, split, vector
Sample programs
A Loop.char
Charmander-charmander charmander-charmander charmander-charmander charmander-charmander charmander-charmander
Charmander-charmander charmander-charmander charmander-charmander charmander-charmander charmander-charmander
Charmander-char
Char-char
Charmander-charmander charmander-charmander charmander-charmander
Charmander-charmander charmander-charmander charmander-charmander
Char
Cha-cha
Cha-charmander-char
Char-char
Charmander-charmander charmander-charmander charmander-charmander charmander-charmander charmander-charmander
Charmander
Input.char
Cha <- This means to input into 0
charmander <- This outputs the 0
OOB.char
OOB since DP will be at -1 and then try to print it
Char cha charmander
Output
ignis99:~/workspace/charmander $ dist/build/charmander/charmander A\ Loop.char
A
Core dump:
Program {dp = 1, memory = fromList [0,65,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0], code = [], loopback = []}
Cha char charmander (0).
ignis99:~/workspace/charmander $ dist/build/charmander/charmander Input.char
qq
Core dump:
Program {dp = 0, memory = fromList [113,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0], code = [], loopback = []}
Cha char charmander (0).
ignis99:~/workspace/charmander $ dist/build/charmander/charmander OOB.char
qcharmander: ./Data/Vector/Generic/Mutable.hs:730 (update): index out of bounds (-1,128)
-
5\$\begingroup\$ Whoever this is a gift for will be very happy. \$\endgroup\$anon– anon2016年05月21日 14:20:35 +00:00Commented May 21, 2016 at 14:20
-
\$\begingroup\$ This question, and the answer by @Zeta, have been selected as the winner of Best of Code Review 2016 — Night and Day. \$\endgroup\$200_success– 200_success2017年01月18日 19:16:56 +00:00Commented Jan 18, 2017 at 19:16
1 Answer 1
Congratulations on your first large project. I'm not sure whether this review has grown a little bit overboard, as it is now both a review as well as a mini tutorial. Either way:
What the char
?
Charmander-char Char cha Charmander Char. Char? Charmander!
- Is it confusing in general?
Char! I mean, yes. Mostly due to the names of your functions. As you already acknowledged, Charmander is essentially Brainfuck. Therefore, it has some simple operations, +
, -
, >
, <
, .
, ,
, [
and ]
, which have clear semantics: increase_current_cell
, decrease_current_cell
, move_to_cell_right
, move_to_cell_left
and so on.
However, your commands don't abstract themselves away from the original language:
parse :: String -> Program -> IO Program
parse "char" = char
parse "char-char" = charChar
parse "cha" = cha
parse "charmander" = charmander
parse "cha-cha" = chaCha
parse "charmander-charmander" = charmanderCharmander
parse "charmander-char" = charmanderChar
parse "cha-charmander-char" = chaCharmanderChar
parse _ = unknown
At this point, you gain almost nothing compared to the original code in your *.char
file. Instead, you have to keep all meanings in your (brain's) memory.
To go back to Brainfuck, this would be the same as writing
parse :: Char -> Program -> IO Program
parse '<' = lessThan
parse '>' = greaterThan
parse '+' = plus
parse '-' = minus
parse '.' = dot
parse ',' = comma
parse '[' = leftBracket
parse ']' = rightBracket
parse _ = unknown
Sure, it works. But your programs don't have the right names. Which brings us to naming.
Getting bulbasane (*) again
First of all, you should rename your programs. While it's "just" a gift to someone, you still want to know what your program actually meant, later:
-- instead of char
previousCell :: Program -> IO Program
previousCell = return $ move (-1) Program
-- instead of charChar
nextCell :: Program -> IO Program
nextCell = return $ move 1 Program
Remember, you are the person most likely to read the code again, so you want to grasp what's going on quickly. Now parse
only changes slightly:
parse :: String -> Program -> IO Program
parse "char" = previousCell
parse "char-char" = nextCell
parse "cha" = getCell
parse "charmander" = putCell
parse "cha-cha" = decreaseCell
parse "charmander-charmander" = increaseCell
parse "charmander-char" = startLoop
parse "cha-charmander-char" = endLoop
parse _ = unknown
A nice side effect: if you lose your original documentation of Charmander, you can still look it up here.
However, this is clunky. We have to interpret and parse the program over and over again. Which brings us to your other questions:
- I executed the program by splitting the source code into a list of words (the code state), then recursively going through the list and parseing it on each loop. Is there a better way to do it? Tokenization maybe?
- I just felt the way I handled loops are inelegant; such as toChaCharmanderChar'.
Here is were we start the real journey.
Proper types (aka use an AST)
Let us review your types:
type Code = [String]
data Program = Program {
dp :: DP,
memory :: Memory,
code :: Code,
loopback :: [Code]
} deriving Show
The name Code
is true: you're "just" saving a list of words. However, when you work with a programming language, you don't want to work with the original code too long (unless it contains parser errors). Instead, you want to work with something that describes your program in terms of instructions, or syntax (an abstract syntax tree, AST, if you want to look for more information).
Let's think of instructions for your program:
data CharmanderInstruction
= IncreaseCell
| DecreaseCell
| MoveRight
| MoveLeft
| PutChar
| GetChar
| Loop [CharmanderInstruction]
deriving Show
That contains all actions we can do in Charmander: we can increase/decrease the cell, move left or right, put a character or get one, and we can loop. Note that there isn't a StartLoop
and EndLoop
. Either we've got a correct loop, or we don't.
Now, your Code
can be thought of [CharmanderInstruction]
:
type Code = [CharmanderInstruction]
Separation of concerns
This method gives us an easier separation of concerns. If we split the responsibility of our functions, it will also be easier to reason about their behaviour.
Pretty printing
The nice part about this is that you can now print the program as you like. Maybe you want to read a rather verbose version:
pretty :: (CharmanderInstruction -> String) -> [CharmanderInstruction] -> String
pretty f xs = concatMap f xs
prettyVerbose :: [CharmanderInstruction] -> String
prettyVerbose = pretty verbose
where
verbose IncreaseCell = "Increase the current cell\n"
verbose DecreaseCell = "Decrease the current cell\n"
...
Or you want to print Charmander code:
prettyCharmander :: [CharmanderInstruction] -> String
prettyCharmander = pretty charmander
where
charmander IncreaseCell = "charmander-charmander\n"
charmander DecreaseCell = "cha-cha\n"
...
Or you want to print... Brainfuck:
prettyBrainfuck :: [CharmanderInstruction] -> String
prettyBrainfuck = pretty brainfuck
where
brainfuck IncreaseCell = "+"
brainfuck DecreaseCell = "-"
...
As you can see, using CharmanderInstruction
will enable you to print the code in many different ways.
Parsing
But printing instructions doesn't help. We need to somehow parse them. You would rewrite parse
to
parse :: String -> Code
and parse the code as instructions instead of changing the program. Parsing the loop can be tricky, but is manageable. However, this will actually show you the instructions, whereas your previous program only showed you the code you already had in your file.
Also, you can write parseBrainfuck
and parseBulbasaur
the same way, and the rest of your pipeline will still work.
Execute
- I created loops by putting the current code state into a stack named loopback, and using this to "go back in time" when needed. Is there a better way to do it?
- I just felt the way I handled loops are inelegant; such as toChaCharmanderChar'.
If you use the approach above, you can use something like
-- Pseudo code
execute :: CharmanderInstruction -> Program -> ...
execute (Loop code) p
| currentValueIsZero p = interpret (next p) p
| otherwise = interpret code p
execute ...
You can simply handle the loops recursively. Also, since you're essentially acting over a state, you could use StateT Program IO a
and lens, but that's a preference.
Everything in a single image
To put things in perspective, here's an image of what will be, and what's possible:
If you follow this model, it will be easy to add other Brainfuck-like languages.
Further features of CharmanderInstruction
There's an additional feat, though, using CharmanderInstruction
instead of IO Program
during your parse. Let's say you want to write a simple echo
program, that just takes input from the user and writes it back:
Charmander Code | Brainfuck
--------------------------------+----------------------------
Cha | ,
Charmander-char | [
Charmander | .
Cha | ,
Cha-charmander-char | ]
The code would look like this:
echoCode :: [CharmanderInstruction]
echoCode = [ GetChar , Loop [ PutChar, GetChar ] ]
Now, if you want to test it, you have to run your interpreter, type some things, and verify that everything works as intended. However, now that you're only working with abstract instructions, you can write a function that uses a String
to simulate input:
simulate :: Code -> String -> String
simulate program input = ...
We can now use QuickCheck to test your function:
prop_echo = property $
forAll (listOf (choose (' ','~'))) $ \input -> -- only "nice" ASCII values
simulate echoCode (input ++ ['0円']) == input -- input should be output
If possible, there should be less IO
As you can see, simulate
doesn't need IO
. If done correctly, you can use it to write your original interpret
:
interpret :: Code -> IO ()
interpret code = getContents >>= putStrLn . simulate code
Again, this makes your code easier to reason. We know by simulate
's type that it has some kind of internal representation of the current program state, and we know it cannot grab actual input from the user on its own.
Other answered questions
- Is Vector the best choice for representing current memory state in the program state record, Program?
If you only want limited memory, yes. However, since you're mutating the vector in IO
, you probably want to use a mutable variant instead.
If you want to simulate "infinite" memory, you can use two lists and a single entry, which models the left, current and right part of the memory:
data Band = Band [Int] Int [Int]
moveLeft :: Band -> Band
moveLeft (Band ls v []) = Band (v:ls) 0 []
moveLeft (Band ls v (r:rs)) = Band (v:ls) r rs
moveRight :: Band -> Band
moveRight (Band [] v rs) = Band [] 0 (v : rs)
moveRight (Band (l:ls) v rs) = Band ls l (v : rs)
- Did I use guards and pattern matching correctly?
Overall, yes. The guards in toChaCharmanderChar'
will go away if you use the AST, though.
Other quirks
You use Integer
for your cells. Since you're most likely on an 64bit system, maxBound :: Int
is 9223372036854775807
. If you want to get one of your values larger than that, you need to run IncreaseCell
9223372036854775807 times. Even if every IncreaseCell
took only one femtosecond (1fs), it would take you a whole day to evaluate it.
So use Int
instead of Integer
instead. This will also get rid of the fromIntegral
calls.
Further references
- quchen's article on writing a Brainfuck interpreter. Uses the same
Tape
as above, and a similar instruction level. - For a more general way, have a look at the
Free
monad.
(*) sorry for the bad pun.
-
1\$\begingroup\$ Very informative. Thank you! Charmander-charmander! \$\endgroup\$Ignis Incendio– Ignis Incendio2016年05月20日 12:20:54 +00:00Commented May 20, 2016 at 12:20
-
1\$\begingroup\$ @IgnisIncendio: You're welcome. I completely forgot my remarks on
IO
yesterday, though. Sorry ^^". \$\endgroup\$Zeta– Zeta2016年05月21日 13:03:22 +00:00Commented May 21, 2016 at 13:03 -
5\$\begingroup\$ "* sorry for the bad pun." No you aren't. \$\endgroup\$anon– anon2016年05月21日 15:49:47 +00:00Commented May 21, 2016 at 15:49
Explore related questions
See similar questions with these tags.