I'm teaching myself Rust, and to do this I've written a toy register based virtual machine. I hope the code is easy to follow - I just want to know if there are any mistakes I am making with the language philosophy or if there are syntax tricks I could be using. For example, the program counter pc
and the register fields reg_field
are mutable. I feel like this is the right thing to do in this case as it better reflects processor architecture. But of course, I might be wrong...
const NUMBER_OF_REGISTERS: usize = 3;
enum Instruction {
Halt,
Load { reg: usize, value: u16 },
Swap { reg1: usize, reg2: usize, reg3: usize },
Add { reg1: usize, reg2: usize, reg3: usize },
Branch { offset: usize }
}
fn main() {
let encoded_instructions = &[0x1110, 0x2100, 0x3010, 0x0];
cpu(encoded_instructions);
}
fn cpu(encoded_instructions: &[u16]) {
let mut pc = 0;
let mut reg_field = [0; NUMBER_OF_REGISTERS];
loop {
let encoded_instruction = fetch(pc, encoded_instructions);
let decoded_instruction = decode(encoded_instruction);
match decoded_instruction {
Some(Instruction::Load { reg, value })
=> load(reg, value, &mut reg_field, &mut pc),
Some(Instruction::Swap { reg1, reg2, reg3 })
=> swap(reg1, reg2, reg3, &mut reg_field, &mut pc),
Some(Instruction::Add { reg1, reg2, reg3 })
=> add(reg1, reg2, reg3, &mut reg_field, &mut pc),
Some(Instruction::Branch { offset })
=> branch(offset, &mut pc),
Some(Instruction::Halt)
=> { halt(®_field); break }
None => break,
}
}
}
fn fetch(pc: usize, instructions: &[u16]) -> u16 {
instructions[pc]
}
fn halt(register_field: &[u16]) {
println!("{:?}", register_field[0]);
}
fn load(register: usize, value: u16, register_field: &mut [u16], pc: &mut usize) {
register_field[register] = value;
*pc += 1;
}
fn swap(reg1: usize, reg2: usize, reg3: usize, register_field: &mut [u16], pc: &mut usize) {
register_field[reg3] = register_field[reg1];
register_field[reg1] = register_field[reg2];
register_field[reg2] = register_field[reg3];
*pc += 1;
}
fn add(reg1: usize, reg2: usize, reg3: usize, register_field: &mut [u16], pc: &mut usize) {
register_field[reg3] = register_field[reg1] + register_field[reg2];
*pc += 1;
}
fn branch(offset: usize, pc: &mut usize) {
*pc -= offset;
}
fn decode(encoded_instruction: u16) -> Option<Instruction> {
let operator = encoded_instruction >> 12;
let reg1 = ((encoded_instruction >> 8) & 0xF) as usize;
let reg2 = ((encoded_instruction >> 4) & 0xF) as usize;
let reg3 = (encoded_instruction & 0xF) as usize;
let offset = (encoded_instruction & 0xFFF) as usize;
let value = encoded_instruction & 0xFF;
match operator {
0 => Some(Instruction::Halt),
1 => Some(Instruction::Load { reg: reg1, value: value }),
2 => Some(Instruction::Swap { reg1: reg1, reg2: reg2, reg3: reg3 }),
3 => Some(Instruction::Add { reg1: reg1, reg2: reg2, reg3: reg3 }),
4 => Some(Instruction::Branch { offset: offset }),
_ => None,
}
}
1 Answer 1
Your code was very readable and easy to follow. Most of my suggestions center around trying to showcase some more Rust idioms and features.
A collection of registers in normally called a register file.
Since the constant for the number of registers is only used to declare the register file, create a type alias and use that. This sets you up nicely to make it a standalone type in the future.
I'm glad to see names for registers in
Instruction
, but maybe give them more meaningful names?Add
is a great example - which registers are the inputs and which is the output?You almost always want to derive
Debug
for a type.Copy
andClone
are also very common. ImplementingPartialEq
will allow easier test writing.There's a distinct lack of types and methods, overall the code feels pretty C-like. I'd associate functions with types, making them methods. For example,
decode
is the most obvious change to me.Returning an
Option
fromdecode
is iffy. Failing to decode an instruction doesn't seem like the absence of a value, it feels like a failure, which is normally reserved forResult
.When splitting match arms onto a different line from the pattern, use braces.
Consider inlining some of the instruction implementations into
execute
as many are only one line long anyway.In a processor, it's very typical to automatically increment the program counter every instruction. Branch offsets take that automatic increment into account. This is nice here because it removes
pc
from all the non-branching methods. I just subtracted 1 when applying the jump, but it's probably better to fixup the offsets.Create a type for sequences of instructions, I used
Program
. This will give you somewhere to hangfetch
.
type RegisterFile = [u16; 3];
#[derive(Debug, Copy, Clone)]
enum Instruction {
Halt,
Load { reg: usize, value: u16 },
Swap { reg1: usize, reg2: usize, reg3: usize },
Add { reg1: usize, reg2: usize, reg3: usize },
Branch { offset: usize }
}
impl Instruction {
fn decode(encoded_instruction: u16) -> Option<Self> {
let operator = encoded_instruction >> 12;
let reg1 = ((encoded_instruction >> 8) & 0xF) as usize;
let reg2 = ((encoded_instruction >> 4) & 0xF) as usize;
let reg3 = (encoded_instruction & 0xF) as usize;
let offset = (encoded_instruction & 0xFFF) as usize;
let value = encoded_instruction & 0xFF;
match operator {
0 => Some(Instruction::Halt),
1 => Some(Instruction::Load { reg: reg1, value: value }),
2 => Some(Instruction::Swap { reg1: reg1, reg2: reg2, reg3: reg3 }),
3 => Some(Instruction::Add { reg1: reg1, reg2: reg2, reg3: reg3 }),
4 => Some(Instruction::Branch { offset: offset }),
_ => None,
}
}
fn execute(&self, registers: &mut [u16], pc: &mut usize) -> bool {
match *self {
Instruction::Load { reg, value } => {
load(reg, value, registers);
},
Instruction::Swap { reg1, reg2, reg3 } => {
swap(reg1, reg2, reg3, registers);
},
Instruction::Add { reg1, reg2, reg3 } => {
add(reg1, reg2, reg3, registers);
},
Instruction::Branch { offset } => {
branch(offset, pc);
},
Instruction::Halt => {
halt(registers);
return false;
},
}
true
}
}
fn halt(register_file: &[u16]) {
println!("{:?}", register_file[0]);
}
fn load(register: usize, value: u16, register_file: &mut [u16]) {
register_file[register] = value;
}
fn swap(reg1: usize, reg2: usize, reg3: usize, register_file: &mut [u16]) {
register_file[reg3] = register_file[reg1];
register_file[reg1] = register_file[reg2];
register_file[reg2] = register_file[reg3];
}
fn add(reg1: usize, reg2: usize, reg3: usize, register_file: &mut [u16]) {
register_file[reg3] = register_file[reg1] + register_file[reg2];
}
fn branch(offset: usize, pc: &mut usize) {
*pc -= offset - 1;
}
struct Program<'a> {
instructions: &'a [u16],
}
impl<'a> Program<'a> {
fn fetch(&self, pc: usize) -> u16 {
self.instructions[pc]
}
}
fn cpu(program: Program) {
let mut pc = 0;
let mut registers = RegisterFile::default();
loop {
let encoded_instruction = program.fetch(pc);
let decoded_instruction = Instruction::decode(encoded_instruction);
match decoded_instruction {
Some(instr) => {
if !instr.execute(&mut registers, &mut pc) { break }
}
None => break,
}
pc += 1;
}
}
fn main() {
let encoded_instructions = Program { instructions: &[0x1110, 0x2100, 0x3010, 0x0] };
cpu(encoded_instructions);
}
-
\$\begingroup\$ @shepmaster, this is fantastic, but there's one point I don't understand. You have
impl<'a> Program<'a>
, and I'm not clear on the syntax here. Are you saying that your guaranteeing that Program will be live for as long as the instructions are live, or the other way 'round, or what? \$\endgroup\$Elf Sternberg– Elf Sternberg2017年02月07日 01:17:25 +00:00Commented Feb 7, 2017 at 1:17 -
1\$\begingroup\$ @ElfSternberg I'd highly recommend reading The Rust Programming Language, specifically the section about lifetimes (or maybe the related chapter in the in-progress revision). Lifetimes are a key component of what makes Rust such a great language, but a comment cannot explain them thoroughly enough. In this case,
Program
contains a reference to a slice of instructions and the compiler ensures that the slice will outliveProgram
; preventing invalid references. \$\endgroup\$Shepmaster– Shepmaster2017年02月07日 02:58:40 +00:00Commented Feb 7, 2017 at 2:58 -
\$\begingroup\$ I've read much of The Rust Programming Language book and have been working my way through Rust 101, and your explanation here was more clear and coherent than anything I'd read elsewhere. Lifetimes are one of the bugaboos of C++ and I appreciate the explanation. \$\endgroup\$Elf Sternberg– Elf Sternberg2017年02月07日 22:36:42 +00:00Commented Feb 7, 2017 at 22:36