I decided to learn more about how interpreters work and made a simple Scheme interpreter in Python. I'd like to hear some advice on how I could have written this better, or on the mistakes I've made.
Parser.py (I thought about putting the Tokenizer and Atom classes in different files, but they are too small for that. Maybe I should somehow rename the file to get rid of the confusion?):
import Types
class Parser:
def __init__(self) -> None:
pass
def parse(self, tokens : list) -> Types.Exp:
if len(tokens) == 0:
raise SyntaxError('Unexpected EOF.')
token = tokens.pop(0)
if token == '(':
L = []
while tokens[0] != ')':
L.append(self.parse(tokens))
tokens.pop(0) # pop off ')'
return L
elif token == ')':
raise SyntaxError('Unexpected syntax.')
else:
atom = Atom(token)
return atom.value
class Tokenizer:
def tokenize(self, program : str) -> list:
return program.replace('(', ' ( ').replace(')', ' ) ').split()
class Atom:
def __init__(self, value) -> None:
self.value = self.__set__value(value)
def __set__value(self, value):
try: return int(value)
except ValueError:
try: return float(value)
except ValueError:
return Types.Symbol(value)
Types.py:
Symbol = str
List = list
Number = (int, float)
Atom = (Symbol, Number)
Exp = (Atom, List)
Eval.py (I thought it was not good to implement the Procedure class here and it would be more natural to put it in Types.py, but I could not solve the problem with the two-way import. I need to import Types.py into Eval.py and Eval.py into Types.py while I use eval() in Procedure.__call__):
import Types
from Environment import Environment
global_env = Environment()
def eval(exp : Types.Exp, e = global_env):
if isinstance(exp, Types.Symbol):
return e.env[exp]
elif isinstance(exp, Types.Number):
return exp
op = exp[0]
args = exp[1:]
if op == 'quote':
return args[0]
elif op == "if":
(syntax_if, test, conseq, alt) = exp
exp = (conseq if eval(test, e) else alt)
return eval(exp, e)
elif op == "define":
(_, symbol, exp) = exp
e.env[symbol] = eval(exp, e)
elif op == "set!":
(symbol, value) = args
e.env[symbol] = eval(value, e)
elif op == "lambda":
(parms, body) = args
return Procedure(parms, body, e)
else:
proc = eval(op, e)
args = [eval(arg, e) for arg in args]
if proc is None and args is not None:
for arg in args:
if arg is not None:
return arg
return proc(*args)
class Procedure:
def __init__(self, parms, body, env):
self.parms, self.body, self.env = parms, body, env
def __call__(self, *args):
function_env = Environment(self.parms, args, self.env)
return eval(self.body, function_env)
Environment.py (I implemented memory using dictionary):
import math
import operator as op
import Types
class Environment:
def __init__(self, parms=(), args=(), outer = None):
self.env = {}
if outer is not None:
self.env = outer.env
else:
self.__set_default_env()
self.env.update(zip(parms, args))
self.outer = outer
def find(self, key):
if key in self.env:
return self.env[key]
elif self.outer is not None:
return self.outer.find(key)
return None
def update(self, parms=(), args=()):
self.env.update(zip(parms, args))
def __set_default_env(self):
self.env.update(vars(math))
self.env.update({
'+':op.add, '-':op.sub, '*':op.mul, '/':op.truediv,
'>':op.gt, '<':op.lt, '>=':op.ge, '<=':op.le, '=':op.eq,
'abs': abs,
'append': op.add,
'apply': lambda proc, args: proc(*args),
'begin': lambda *x: x[-1],
'car': lambda x: x[0],
'cdr': lambda x: x[1:],
'cons': lambda x,y: [x] + y,
'eq?': op.is_,
'expt': pow,
'equal?': op.eq,
'length': len,
'list': lambda *x: Types.List(x),
'list?': lambda x: isinstance(x, Types.List),
'map': map,
'max': max,
'min': min,
'not': op.not_,
'null?': lambda x: x == [],
'number?': lambda x: isinstance(x, Types.Number),
'print': print,
'procedure?': callable,
'round': round,
'symbol?': lambda x: isinstance(x, Types.Symbol),
})
I also made some tests (link on github repository)
1 Answer 1
Keeping small classes like Parser and Atom in same module seems fine.
If you're ever in doubt, here's an easy approach to making such a decision.
Write down a single sentence in a """docstring""" at top-of-module, describing what it does, what it's responsible for. Do not write down that the module holds everything and the kitchen sink.
Then examine each contained class, and ask yourself whether it fits within that sentence or whether it more closely resembles a kitchen sink. Keep it or evict it based on that.
You don't have to write a one-sentence docstring for each class. But you may find that it helps the process.
class Parser:
def __init__(self) -> None:
pass
Delete the empty constructor, please.
Which brings us to a related item.
It looks like you really wanted a def parse():
function.
There can be valid reasons for retaining such a method within a class, such as grouping related items. In which case you should decorate with @staticmethod.
But here, it seems a function is indicated.
Plus, it should be def tokenize(...):
nit: Your annotated signatures look odd.
A quick
$ black -S *.py
will tidy up extra whitespace.
The current annotation is nice enough, but it
would be more helpful if you spelled out tokens: list[str]
def __set__value(self, value):
No, please don't use __
double underscore name mangling;
it's not what you want.
Just stick with a single _
underscore prefix to indicate
this method is private.
And a single _
underscore after "set", as well.
A black -S *.py
pass would improve the readability of this method.
You use the verb "set". But "get" seems to be what this function is doing.
Recommend that you prefer lowercase names for module / file names.
Rather than import Types
, prefer from Types import Exp, Number, ...
op = exp[0]
args = exp[1:]
I don't understand those expressions.
The signature annotation told me that exp
is an Exp
.
But now we apparently have List[Exp]
??
Recommend you use mypy
to lint this code.
Here is a perfectly nice tuple unpack:
(syntax_if, test, conseq, alt) = exp
The convention is to use _
to indicate an unused placeholder:
(_, test, conseq, alt) = exp
Oh, look! That's exactly what we see for "define"
, good.
Though I guess that for both of these a tuple unpack of args
would have been more convenient.
return Procedure(parms, body, e)
I don't understand why we're creating an object there.
Would it suffice to return procedure(...)
?
Using a def procedure(...):
?
if proc is None and args is not None:
for arg in args:
if arg is not None:
return arg
Sorry, you kind of lost me there.
Not sure what input would exercise this code.
Please add a # comment
to help out the Gentle Reader.
Or extract helper, so you can name the function
and maybe adorn it with a docstring.
class Environment:
def __init__(self, parms=(), args=(), outer=None):
I guess that's nice enough.
I view the signature as an invitation for mypy
to assume a pair of tuple
types.
But we sometimes pass in list
arguments.
Maybe declare parms: Iterable
?
Similarly for .update()
Maybe you wanted to turn lists into tuples higher up the call stack? That is, we could reconsider the Public API we choose to offer.
We definitely don't want a mutable default.
def __set_default_env(self):
self.env.update(vars(math))
As mentioned above, prefer a single _
underscore prefix.
And that second line is fascinating, well done! I like that. A lot of power in a single utterance.
Feature request: maybe you'd like to expose
a general (import "module_name")
verb to users?
I imagine you're pretty far from compliance with R4RS or similar, and that's fine. It would be entertaining to run this against some automated compliance tests and see what the distance is.
This codebase achieves its design goals. It would benefit from minor class / function reorganization.
I would be willing to delegate or accept maintenance tasks on this codebase.
eval
shadows a Python built-in, you might want to rename the function. \$\endgroup\$