I wrote a small emulator for fun. Full code @ bottom of post, available on GitHub here.
Design choices:
- modeling 16 bit little endian memory — opted for
ctypes
and array-like access via__getitem__
Enum
library- Opcodes - convenient to access: the order of the opcodes in the enum matches the opcode's numeric value when interpreted as an integer
- Condition flags - convenient to access: named, so I can
self.registers.cond = condition_flags.z
where the right hand side is the enum.
Some classes:
- CPU (
class lc3
)- Registers
- Memory
- CPU (
Questions:
- How could I get started adding unit tests?
- Is there a better choice than using an
IntEnum
for the opcodes? - How might I organize the code better? In particular, I dislike having
dump_state
(a diagnostic printing function), and all of my instruction implementations (egop_and_impl
) right next to each other in thelc3
class. - How else might I organize this mapping of opcodes to implementation functions?
# first attempt
if opcode == opcodes.op_add:
self.op_add_impl(instruction)
elif opcode == opcodes.op_and:
self.op_and_impl(instruction)
elif opcode == opcodes.op_not:
self.op_not_impl(instruction)
... truncated https://github.com/ianklatzco/lc3/blob/7bace0a30353d4b1d4c720eddca07c1828f7c3e0/lc3.py#L303
# second attempt
opcode_dict = {
opcodes.op_add: self.op_add_impl,
opcodes.op_and: self.op_and_impl,
opcodes.op_not: self.op_not_impl,
... truncated https://github.com/ianklatzco/lc3/blob/67353ebb50367430a7d2921d701ea92aa2f0968e/lc3.py#L304
try:
opcode_dict[opcode](instruction)
except KeyError:
raise UnimpError("invalid opcode")
- How could I address this inconsistency between accessing GPRs (general purpose registers) and PC,
cond
ition register?
class registers():
def __init__(self):
self.gprs = (c_int16 * 8)()
self.pc = (c_uint16)()
self.cond = (c_uint16)()
# I instantiated the gprs as a ctypes "array" instead of a single c_uint16.
# To access:
# registers.gprs[0]
# This is convenient when I need to access a particular register, and I have the index handy from a decoded instruction.
# registers.pc.value
# The .value is annoying.
Full code
# usage: python3 lc3.py ./second.obj
# This project inspired by https://justinmeiners.github.io/lc3-vm/
# There was a lot of copy-pasting lines of code for things like
# pulling pcoffset9 out of an instruction.
# https://justinmeiners.github.io/lc3-vm/#1:14
# ^ talks about a nice compact way to encode instructions using bitfields and
# c++'s templates.
# i am curious if you could do it with python decorators.
# update: i tried this and it was mostly just an excuse to learn decorators, but it
# isn't the right tool. i am curious how else you might do it.
from ctypes import c_uint16, c_int16
from enum import IntEnum
from struct import unpack
from sys import exit, stdin, stdout, argv
from signal import signal, SIGINT
import lc3disas # in same dir
DEBUG = False
class UnimpError(Exception):
pass
def signal_handler(signal, frame):
print("\nbye!")
exit()
signal(SIGINT, signal_handler)
# https://stackoverflow.com/a/32031543/1234621
# you're modeling sign-extend behavior in python, since python has infinite
# bit width.
def sext(value, bits):
sign_bit = 1 << (bits - 1)
return (value & (sign_bit - 1)) - (value & sign_bit)
'''
iirc the arch is 16bit little endian.
options: ctypes or just emulate it in pure python.
chose: ctypes
'''
class memory():
def __init__(self):
# ctypes has an array type. this is one way to create instances of it.
self.memory = (c_uint16 * 65536)()
def __getitem__(self, arg):
if (arg > 65535) or (arg < 0):
raise MemoryError("Accessed out valid memory range.")
return self.memory[arg]
def __setitem__(self, location, thing_to_write):
if (location > 65536) or (location < 0):
raise MemoryError("Accessed out valid memory range.")
self.memory[int(location)] = thing_to_write
class registers():
def __init__(self):
self.gprs = (c_int16 * 8)()
self.pc = (c_uint16)()
self.cond = (c_uint16)()
# not actually a class but an enum.
class opcodes(IntEnum):
op_br = 0
op_add = 1
op_ld = 2
op_st = 3
op_jsr = 4
op_and = 5
op_ldr = 6
op_str = 7
op_rti = 8
op_not = 9
op_ldi = 10
op_sti = 11
op_jmp = 12
op_res = 13
op_lea = 14
op_trap = 15
class condition_flags(IntEnum):
p = 0
z = 1
n = 2
class lc3():
def __init__(self, filename):
self.memory = memory()
self.registers = registers()
self.registers.pc.value = 0x3000 # default program starting location
self.read_program_from_file(filename)
def read_program_from_file(self,filename):
with open(filename, 'rb') as f:
_ = f.read(2) # skip the first two byte which specify where code should be mapped
c = f.read() # todo support arbitrary load locations
for count in range(0,len(c), 2):
self.memory[0x3000+count/2] = unpack( '>H', c[count:count+2] )[0]
def update_flags(self, reg):
if self.registers.gprs[reg] == 0:
self.registers.cond = condition_flags.z
if self.registers.gprs[reg] < 0:
self.registers.cond = condition_flags.n
if self.registers.gprs[reg] > 0:
self.registers.cond = condition_flags.p
def dump_state(self):
print("\npc: {:04x}".format(self.registers.pc.value))
print("r0: {:05} ".format(self.registers.gprs[0]), end='')
print("r1: {:05} ".format(self.registers.gprs[1]), end='')
print("r2: {:05} ".format(self.registers.gprs[2]), end='')
print("r3: {:05} ".format(self.registers.gprs[3]), end='')
print("r4: {:05} ".format(self.registers.gprs[4]), end='')
print("r5: {:05} ".format(self.registers.gprs[5]), end='')
print("r6: {:05} ".format(self.registers.gprs[6]), end='')
print("r7: {:05} ".format(self.registers.gprs[7]))
print("r0: {:04x} ".format(c_uint16(self.registers.gprs[0]).value), end='')
print("r1: {:04x} ".format(c_uint16(self.registers.gprs[1]).value), end='')
print("r2: {:04x} ".format(c_uint16(self.registers.gprs[2]).value), end='')
print("r3: {:04x} ".format(c_uint16(self.registers.gprs[3]).value), end='')
print("r4: {:04x} ".format(c_uint16(self.registers.gprs[4]).value), end='')
print("r5: {:04x} ".format(c_uint16(self.registers.gprs[5]).value), end='')
print("r6: {:04x} ".format(c_uint16(self.registers.gprs[6]).value), end='')
print("r7: {:04x} ".format(c_uint16(self.registers.gprs[7]).value))
print("cond: {}".format(condition_flags(self.registers.cond.value).name))
def op_add_impl(self, instruction):
sr1 = (instruction >> 6) & 0b111
dr = (instruction >> 9) & 0b111
if ((instruction >> 5) & 0b1) == 0: # reg-reg
sr2 = instruction & 0b111
self.registers.gprs[dr] = self.registers.gprs[sr1] + self.registers.gprs[sr2]
else: # immediate
imm5 = instruction & 0b11111
self.registers.gprs[dr] = self.registers.gprs[sr1] + sext(imm5, 5)
self.update_flags(dr)
def op_and_impl(self, instruction):
sr1 = (instruction >> 6) & 0b111
dr = (instruction >> 9) & 0b111
if ((instruction >> 5) & 0b1) == 0: # reg-reg
sr2 = instruction & 0b111
self.registers.gprs[dr] = self.registers.gprs[sr1] & self.registers.gprs[sr2]
else: # immediate
imm5 = instruction & 0b11111
self.registers.gprs[dr] = self.registers.gprs[sr1] & sext(imm5, 5)
self.update_flags(dr)
def op_not_impl(self, instruction):
sr = (instruction >> 6) & 0b111
dr = (instruction >> 9) & 0b111
self.registers.gprs[dr] = ~ (self.registers.gprs[sr])
self.update_flags(dr)
def op_br_impl(self, instruction):
n = (instruction >> 11) & 1
z = (instruction >> 10) & 1
p = (instruction >> 9) & 1
pc_offset_9 = instruction & 0x1ff
if (n == 1 and self.registers.cond == condition_flags.n) or \
(z == 1 and self.registers.cond == condition_flags.z) or \
(p == 1 and self.registers.cond == condition_flags.p):
self.registers.pc.value = self.registers.pc.value + sext(pc_offset_9, 9)
# also ret
def op_jmp_impl(self, instruction):
baser = (instruction >> 6) & 0b111
self.registers.pc.value = self.registers.gprs[baser]
def op_jsr_impl(self, instruction):
# no jsrr?
if 0x0400 & instruction == 1: raise UnimpError("JSRR is not implemented.")
pc_offset_11 = instruction & 0x7ff
self.registers.gprs[7] = self.registers.pc.value
self.registers.pc.value = self.registers.pc.value + sext(pc_offset_11, 11)
def op_ld_impl(self, instruction):
dr = (instruction >> 9) & 0b111
pc_offset_9 = instruction & 0x1ff
addr = self.registers.pc.value + sext(pc_offset_9, 9)
self.registers.gprs[dr] = self.memory[addr]
self.update_flags(dr)
def op_ldi_impl(self, instruction):
dr = (instruction >> 9) & 0b111
pc_offset_9 = instruction & 0x1ff
addr = self.registers.pc.value + sext(pc_offset_9, 9)
self.registers.gprs[dr] = self.memory[ self.memory[addr] ]
self.update_flags(dr)
def op_ldr_impl(self, instruction):
dr = (instruction >> 9) & 0b111
baser = (instruction >> 6) & 0b111
pc_offset_6 = instruction & 0x3f
addr = self.registers.gprs[baser] + sext(pc_offset_6, 6)
self.registers.gprs[dr] = self.memory[addr]
self.update_flags(dr)
def op_lea_impl(self, instruction):
dr = (instruction >> 9) & 0b111
pc_offset_9 = instruction & 0x1ff
self.registers.gprs[dr] = self.registers.pc.value + sext(pc_offset_9, 9)
self.update_flags(dr)
def op_st_impl(self, instruction):
dr = (instruction >> 9) & 0b111
pc_offset_9 = instruction & 0x1ff
addr = self.registers.pc.value + sext(pc_offset_9, 9)
self.memory[addr] = self.registers.gprs[dr]
def op_sti_impl(self, instruction):
dr = (instruction >> 9) & 0b111
pc_offset_9 = instruction & 0x1ff
addr = self.registers.pc.value + sext(pc_offset_9, 9)
self.memory[ self.memory[addr] ] = self.registers.gprs[dr]
def op_str_impl(self, instruction):
dr = (instruction >> 9) & 0b111
baser = (instruction >> 6) & 0b111
pc_offset_6 = instruction & 0x3f
addr = self.registers.gprs[baser] + sext(pc_offset_6, 6)
self.memory[addr] = self.registers.gprs[dr]
def op_trap_impl(self, instruction):
trap_vector = instruction & 0xff
if trap_vector == 0x20: # getc
c = stdin.buffer.read(1)[0]
self.registers.gprs[0] = c
return
if trap_vector == 0x21: # out
stdout.buffer.write( bytes( [(self.registers.gprs[0] & 0xff)] ) )
stdout.buffer.flush()
return
if trap_vector == 0x22: # puts
base_addr = self.registers.gprs[0]
index = 0
while (self.memory[base_addr + index]) != 0x00:
nextchar = self.memory[base_addr + index]
stdout.buffer.write( bytes( [nextchar] ) )
index = index + 1
return
if trap_vector == 0x25:
self.dump_state()
exit()
raise ValueError("undefined trap vector {}".format(hex(trap_vector)))
def op_res_impl(self, instruction):
raise UnimpError("unimplemented opcode")
def op_rti_impl(self, instruction):
raise UnimpError("unimplemented opcode")
def start(self):
while True:
# fetch instruction
instruction = self.memory[self.registers.pc.value]
# update PC
self.registers.pc.value = self.registers.pc.value + 1
# decode opcode
opcode = instruction >> 12
if DEBUG:
print("instruction: {}".format(hex(instruction)))
print("disassembly: {}".format(lc3disas.single_ins(self.registers.pc.value, instruction)))
self.dump_state()
input()
opcode_dict = \
{
opcodes.op_add: self.op_add_impl,
opcodes.op_and: self.op_and_impl,
opcodes.op_not: self.op_not_impl,
opcodes.op_br: self.op_br_impl,
opcodes.op_jmp: self.op_jmp_impl,
opcodes.op_jsr: self.op_jsr_impl,
opcodes.op_ld: self.op_ld_impl,
opcodes.op_ldi: self.op_ldi_impl,
opcodes.op_ldr: self.op_ldr_impl,
opcodes.op_lea: self.op_lea_impl,
opcodes.op_st: self.op_st_impl,
opcodes.op_sti: self.op_sti_impl,
opcodes.op_str: self.op_str_impl,
opcodes.op_trap:self.op_trap_impl,
opcodes.op_res: self.op_res_impl,
opcodes.op_rti: self.op_rti_impl
}
try:
opcode_dict[opcode](instruction)
except KeyError:
raise UnimpError("invalid opcode")
##############################################################################
if len(argv) < 2:
print ("usage: python3 lc3.py code.obj")
exit(255)
l = lc3(argv[1])
l.start()
2 Answers 2
I'm not really good at Python here, but I'm sharing my ideas.
Code style
Do you know there's a coding style guide called PEP 8? It provides a set of guidelines for code styling. I'd like to note some of them here:
- Use CapWords naming convention for your classes. Always begin your class name with a cap letter, so
class opcodes(IntEnum)
becomesclass OpCodes(IntEnum)
etc. - Put two blank lines between class definitions and module-level function definitions
Indentation: This is a bad indentation:
if (n == 1 and self.registers.cond == condition_flags.n) or \ (z == 1 and self.registers.cond == condition_flags.z) or \ (p == 1 and self.registers.cond == condition_flags.p): self.registers.pc.value = self.registers.pc.value + sext(pc_offset_9, 9)
This is the correct way to indent it:
if (n == 1 and self.registers.cond == condition_flags.n) or \ (z == 1 and self.registers.cond == condition_flags.z) or \ (p == 1 and self.registers.cond == condition_flags.p): self.registers.pc.value = self.registers.pc.value + sext(pc_offset_9, 9)
You can use a tool called flake8 to find out PEP 8 violations in your code. You may not want all of them - for example, I almost always ignore the line length limit, but this is up to you and it's recommended that you follow all the guidelines unless you have a good reason not to.
Repeated and similar code
I'm talking about lines like this:
print("r0: {:04x} ".format(c_uint16(self.registers.gprs[0]).value), end='')
print("r1: {:04x} ".format(c_uint16(self.registers.gprs[1]).value), end='')
print("r2: {:04x} ".format(c_uint16(self.registers.gprs[2]).value), end='')
print("r3: {:04x} ".format(c_uint16(self.registers.gprs[3]).value), end='')
print("r4: {:04x} ".format(c_uint16(self.registers.gprs[4]).value), end='')
print("r5: {:04x} ".format(c_uint16(self.registers.gprs[5]).value), end='')
print("r6: {:04x} ".format(c_uint16(self.registers.gprs[6]).value), end='')
print("r7: {:04x} ".format(c_uint16(self.registers.gprs[7]).value))
This repetition is just unnecessary work. You can replace it with a nice loop:
for i in range(8):
print("r{}: {:04x} ".format(i, c_uint16(self.registers.gprs[i]).value), end='')
print()
And the same for your other code where this pattern occurs
Conditional style
Use elif
if your conditions are intended not to overlap:
def update_flags(self, reg):
if self.registers.gprs[reg] == 0:
self.registers.cond = condition_flags.z
elif self.registers.gprs[reg] < 0:
self.registers.cond = condition_flags.n
elif self.registers.gprs[reg] > 0:
self.registers.cond = condition_flags.p
Using exceptions
I see you use MemoryError
in your memory class for access violation. This is better replaced by ValueError
or better, IndexError
, because the one you're currently using is reserved for (host) memory issues, particularly memory allocation failures.
There's also another built-in exception for unimplemented stuff, NotImplementedError
. You should consider replacing your own UnimpError
with the built-in one.
-
1
OpCodes
You are initializing your opcode dictionary inside the while loop for fetching instructions. It only needs to be initialized once; move it before the while loop.
Your opcodes are a set of numbers between 0 & 15. You index a dictionary based on these numbers to get the method to call. Why not use an array, instead of a dictionary; it would be faster and take less memory.
Consider building the opcode array (or dictionary) programmatically:
OPCODES = [ 'br', 'add', 'ld', 'st', 'jsr', 'and', 'ldr', 'str',
'rti', 'not', 'ldi', 'sti', 'jmp', 'res', 'lea', 'trap' ]
opcodes = [ getattr(self, f"op_{op}_impl") for op in OPCODES ]
Note: requires Python 3.6 for the f"strings"
. Use .format()
or %
with earlier versions.
Note: this eliminates the need for your class opcodes(IntEnum)
.
Since the op_XXX_impl
functions are not meant to be called externally, they should named starting with an underscore.
Even better: move initialization of the opcodes
array into your lc3
constructor, and store it in the object. It will help when it comes time to add in tests.
self._opcodes = [ getattr(self, f"op_{op}_impl") for op in OPCODES ]
Memory
You could use the array
class for your 16-bit memory; you don’t need to create your own memory
class:
self.memory = array.array('H', [0]*65536)
The 'H'
is the type code for 16 bit unsigned values.
Similarly, you code create your registers without a registers
class. 'h'
is the type code for 16 bit signed values:
self.gprs = array.array('h', [0]*10)
This creates 10 register locations, 8 for the "general purpose" registers, and two additional registers: pc
and cond
, which you could access as self.gprs[8]
and self.gprs[9]
. We can improve on this, making them more accessible using @property
:
@property
def pc(self):
return self.gprs[8]
@pc.setter
def pc(self, value):
self.gprs[8] = value
@property
def cond(self):
return self.gprs[9]
@cond.setter
def cond(self, value):
self.gprs[9] = value
Then you can use and set self.pc
and self.cond
directly.
Instruction Decoding
You repeat a lot of code for decoding instructions. You should write helper functions to extract the required values. Then you could write like:
def op_add_impl(self, instruction):
dr, src1, src2 = self.decode_dr_sr2imm(instruction)
self.gprs[dr] = src1 + src2
self.update_flags(dr)
def op_not_impl(self, instruction):
dr, src1, _ = self.decode_dr_sr2imm(instruction)
self.gprs[dr] = ~ src1
self.update_flags(dr)
Since not
doesn’t use sr2
or an immediate value, the value returned for src2
can be ignored by saving it to the _
variable.
Debug output
Instead of printing to sys.stdout
, you should learn and use the Python logging
module, for adding (and removing) debug output from your program.
import logging
LOG = logging.getLogger(__name__)
LOG.debug("100 in hex is %x", 100)
In the main program, to enable debug output, use:
logging.basicConfig()
LOG.setLevel(logging.DEBUG)
Testability
The start()
method does a lot. Too much, in fact. It loops endlessly, reading instructions from memory, advancing the program counter, and dispatching instructions.
Let's break this down a bit.
Dispatch
You want testability. How about executing just one instruction? In fact, you don't need to read the instruction from memory, either.
def _execute_instruction(self, instruction):
opcode = instruction >> 12
if LOG.isEnabledFor(logging.DEBUG):
LOG.debug("instruction: %04x", instruction)
LOG.debug("disassembly: %s", lc3disas.single_ins(self.pc, instruction))
self.dump_state()
try:
self._opcode[opcode](instruction)
except KeyError:
raise NotImplementedError("Invalid opcode")
Now you could write a test for an individual instruction.
def test_add():
cpu = lc3()
cpu.gprs[0] = 22
cpu._execute_instruction(0x0000) # gprs[0] = gprs[0] + gprs[0]
assert cpu.gprs[0] == 44
assert cpu.cond == condition_flags.p
Single Step
With dispatcher, above, we can now easily write a single stepper:
def single_step(self):
instruction = self.memory[self.pc]
self.pc += 1
self._execute_instruction(instruction)
And again, you can write tests using single stepping:
def test_single_step_add(self):
cpu = lc3()
# Setup
cpu.gprs[0] = -22
cpu.pc = 0x1234
cpu.memory[self.pc] = 0x0000
cpu.single_step()
assert cpu.gprs[0] == -44
assert cpu.cond == condition_flags.n
assert cpu.pc == 0x1235
Running
Using single_step()
, it becomes easy to write the start()
method. But lets make it a little better.
Trap #0x25 is a halt instruction, but it also exits the Python interpreter. That is a little too Draconian. If a program ever generates that trap, any test framework will come crashing down as the interpreter exits. Instead, you should use a flag to indicate whether the CPU is running normally, or if it has been halted.
def start(self):
LOG.debug("Starting")
self._running = True
while self._running:
self.single_step()
LOG.debug("Halted.")
The op_trap_impl()
function would set self._running = False
when the Trap #0x25 is executed.
You can now write a test program that runs, and halts, and check the state of memory when it has halted.
Input / Output
Your LC3 is tied to sys.stdin
and sys.stdout
. This makes it hard to test; you'd have to intercept the input and output streams when you write your tests. Or, you could have your LC3 cpu have custom in
and out
streams, which default to sys.stdin
and sys.stdout
, but can be replaced with StringIO
, so your test can feed data to the program, and retrieve output for validation. The Trap #0x20, #0x21 and #0x22 would need to read/write to the requested io streams.
class LC3():
def __init__(self, *, input=sys.stdin, output=sys.stdout):
self._input = input
self._output = output
def test_io():
in = io.StringIO("7\n") # Input test data
out = io.StringIO() # Capture output to string buffer
lc3 = LC3(input=in, output=out)
lc3.read_program_from_file("fibonnaci.obj")
lc3.start()
assert out.getvalue() == "13\n" # 13 is the 7th Fibonacci number
-
\$\begingroup\$ i learned that names with a leading underscore aren't imported when you
from module import *
. shahriar.svbtle.com/… \$\endgroup\$ian5v– ian5v2019年01月03日 07:07:00 +00:00Commented Jan 3, 2019 at 7:07 -
1\$\begingroup\$ True, but misleading. If you
from module import *
this module, it would import the top level names likesignal_handler
,memory
,registers
,opcodes
, andlc3
. The entirelc3
class is imported, not just a subset of the class. Please do mark non-public members with a leading underscore. If you are importinglc3
into a file of test functions, the test functions will have access to all members of the class whether they start with a leading underscore or not. You shouldn’t access members with leading underscores from outside, but test functions are allowed to cheat a little. \$\endgroup\$AJNeufeld– AJNeufeld2019年01月03日 14:58:56 +00:00Commented Jan 3, 2019 at 14:58
if 0x0400 & instruction == 1:
— I don’t see how this could ever beTrue
. \$\endgroup\$== 0x0800
. In either case,== 1
is not correct, which was the bug I thought I was bringing to your attention. \$\endgroup\$