A simple register VM written in Rust

Question 1

I'm teaching myself Rust, and to do this I've written a toy register based virtual machine. I hope the code is easy to follow - I just want to know if there are any mistakes I am making with the language philosophy or if there are syntax tricks I could be using. For example, the program counter pc and the register fields reg_field are mutable. I feel like this is the right thing to do in this case as it better reflects processor architecture. But of course, I might be wrong...

const NUMBER_OF_REGISTERS: usize = 3;
enum Instruction {
 Halt,
 Load { reg: usize, value: u16 },
 Swap { reg1: usize, reg2: usize, reg3: usize },
 Add { reg1: usize, reg2: usize, reg3: usize },
 Branch { offset: usize }
}
fn main() {
 let encoded_instructions = &[0x1110, 0x2100, 0x3010, 0x0];
 cpu(encoded_instructions);
}
fn cpu(encoded_instructions: &[u16]) {
 let mut pc = 0;
 let mut reg_field = [0; NUMBER_OF_REGISTERS];
 loop {
 let encoded_instruction = fetch(pc, encoded_instructions);
 let decoded_instruction = decode(encoded_instruction);
 match decoded_instruction {
 Some(Instruction::Load { reg, value })
 => load(reg, value, &mut reg_field, &mut pc),
 Some(Instruction::Swap { reg1, reg2, reg3 })
 => swap(reg1, reg2, reg3, &mut reg_field, &mut pc),
 Some(Instruction::Add { reg1, reg2, reg3 })
 => add(reg1, reg2, reg3, &mut reg_field, &mut pc),
 Some(Instruction::Branch { offset })
 => branch(offset, &mut pc),
 Some(Instruction::Halt)
 => { halt(&reg_field); break }
 None => break,
 }
 }
}
fn fetch(pc: usize, instructions: &[u16]) -> u16 {
 instructions[pc]
}
fn halt(register_field: &[u16]) {
 println!("{:?}", register_field[0]);
}
fn load(register: usize, value: u16, register_field: &mut [u16], pc: &mut usize) {
 register_field[register] = value;
 *pc += 1;
}
fn swap(reg1: usize, reg2: usize, reg3: usize, register_field: &mut [u16], pc: &mut usize) {
 register_field[reg3] = register_field[reg1];
 register_field[reg1] = register_field[reg2];
 register_field[reg2] = register_field[reg3];
 *pc += 1;
}
fn add(reg1: usize, reg2: usize, reg3: usize, register_field: &mut [u16], pc: &mut usize) {
 register_field[reg3] = register_field[reg1] + register_field[reg2];
 *pc += 1;
}
fn branch(offset: usize, pc: &mut usize) {
 *pc -= offset;
}
fn decode(encoded_instruction: u16) -> Option<Instruction> {
 let operator = encoded_instruction >> 12;
 let reg1 = ((encoded_instruction >> 8) & 0xF) as usize;
 let reg2 = ((encoded_instruction >> 4) & 0xF) as usize;
 let reg3 = (encoded_instruction & 0xF) as usize;
 let offset = (encoded_instruction & 0xFFF) as usize;
 let value = encoded_instruction & 0xFF;
 match operator {
 0 => Some(Instruction::Halt),
 1 => Some(Instruction::Load { reg: reg1, value: value }),
 2 => Some(Instruction::Swap { reg1: reg1, reg2: reg2, reg3: reg3 }),
 3 => Some(Instruction::Add { reg1: reg1, reg2: reg2, reg3: reg3 }),
 4 => Some(Instruction::Branch { offset: offset }),
 _ => None,
 }
}

Question 2

Your code was very readable and easy to follow. Most of my suggestions center around trying to showcase some more Rust idioms and features.

A collection of registers in normally called a register file.
Since the constant for the number of registers is only used to declare the register file, create a type alias and use that. This sets you up nicely to make it a standalone type in the future.
I'm glad to see names for registers in Instruction, but maybe give them more meaningful names? Add is a great example - which registers are the inputs and which is the output?
You almost always want to derive Debug for a type. Copy and Clone are also very common. Implementing PartialEq will allow easier test writing.
There's a distinct lack of types and methods, overall the code feels pretty C-like. I'd associate functions with types, making them methods. For example, decode is the most obvious change to me.
Returning an Option from decode is iffy. Failing to decode an instruction doesn't seem like the absence of a value, it feels like a failure, which is normally reserved for Result.
When splitting match arms onto a different line from the pattern, use braces.
Consider inlining some of the instruction implementations into execute as many are only one line long anyway.
In a processor, it's very typical to automatically increment the program counter every instruction. Branch offsets take that automatic increment into account. This is nice here because it removes pc from all the non-branching methods. I just subtracted 1 when applying the jump, but it's probably better to fixup the offsets.
Create a type for sequences of instructions, I used Program. This will give you somewhere to hang fetch.

type RegisterFile = [u16; 3];
#[derive(Debug, Copy, Clone)]
enum Instruction {
 Halt,
 Load { reg: usize, value: u16 },
 Swap { reg1: usize, reg2: usize, reg3: usize },
 Add { reg1: usize, reg2: usize, reg3: usize },
 Branch { offset: usize }
}
impl Instruction {
 fn decode(encoded_instruction: u16) -> Option<Self> {
 let operator = encoded_instruction >> 12;
 let reg1 = ((encoded_instruction >> 8) & 0xF) as usize;
 let reg2 = ((encoded_instruction >> 4) & 0xF) as usize;
 let reg3 = (encoded_instruction & 0xF) as usize;
 let offset = (encoded_instruction & 0xFFF) as usize;
 let value = encoded_instruction & 0xFF;
 match operator {
 0 => Some(Instruction::Halt),
 1 => Some(Instruction::Load { reg: reg1, value: value }),
 2 => Some(Instruction::Swap { reg1: reg1, reg2: reg2, reg3: reg3 }),
 3 => Some(Instruction::Add { reg1: reg1, reg2: reg2, reg3: reg3 }),
 4 => Some(Instruction::Branch { offset: offset }),
 _ => None,
 }
 }
 fn execute(&self, registers: &mut [u16], pc: &mut usize) -> bool {
 match *self {
 Instruction::Load { reg, value } => {
 load(reg, value, registers);
 },
 Instruction::Swap { reg1, reg2, reg3 } => {
 swap(reg1, reg2, reg3, registers);
 },
 Instruction::Add { reg1, reg2, reg3 } => {
 add(reg1, reg2, reg3, registers);
 },
 Instruction::Branch { offset } => {
 branch(offset, pc);
 },
 Instruction::Halt => {
 halt(registers);
 return false;
 },
 }
 true
 }
}
fn halt(register_file: &[u16]) {
 println!("{:?}", register_file[0]);
}
fn load(register: usize, value: u16, register_file: &mut [u16]) {
 register_file[register] = value;
}
fn swap(reg1: usize, reg2: usize, reg3: usize, register_file: &mut [u16]) {
 register_file[reg3] = register_file[reg1];
 register_file[reg1] = register_file[reg2];
 register_file[reg2] = register_file[reg3];
}
fn add(reg1: usize, reg2: usize, reg3: usize, register_file: &mut [u16]) {
 register_file[reg3] = register_file[reg1] + register_file[reg2];
}
fn branch(offset: usize, pc: &mut usize) {
 *pc -= offset - 1;
}
struct Program<'a> {
 instructions: &'a [u16],
}
impl<'a> Program<'a> {
 fn fetch(&self, pc: usize) -> u16 {
 self.instructions[pc]
 }
}
fn cpu(program: Program) {
 let mut pc = 0;
 let mut registers = RegisterFile::default();
 loop {
 let encoded_instruction = program.fetch(pc);
 let decoded_instruction = Instruction::decode(encoded_instruction);
 match decoded_instruction {
 Some(instr) => {
 if !instr.execute(&mut registers, &mut pc) { break }
 }
 None => break,
 }
 pc += 1;
 }
}
fn main() {
 let encoded_instructions = Program { instructions: &[0x1110, 0x2100, 0x3010, 0x0] };
 cpu(encoded_instructions);
}

Question 3

@shepmaster, this is fantastic, but there's one point I don't understand. You have impl<'a> Program<'a>, and I'm not clear on the syntax here. Are you saying that your guaranteeing that Program will be live for as long as the instructions are live, or the other way 'round, or what?

Question 4

@ElfSternberg I'd highly recommend reading The Rust Programming Language, specifically the section about lifetimes (or maybe the related chapter in the in-progress revision). Lifetimes are a key component of what makes Rust such a great language, but a comment cannot explain them thoroughly enough. In this case, Program contains a reference to a slice of instructions and the compiler ensures that the slice will outlive Program; preventing invalid references.

Question 5

I've read much of The Rust Programming Language book and have been working my way through Rust 101, and your explanation here was more clear and coherent than anything I'd read elsewhere. Lifetimes are one of the bugaboos of C++ and I appreciate the explanation.

Shepmaster Shepmaster 8,77827 silver badges28 bronze badges · Accepted Answer · 2016-05-28 03:52:31Z

Your code was very readable and easy to follow. Most of my suggestions center around trying to showcase some more Rust idioms and features.

A collection of registers in normally called a register file.
Since the constant for the number of registers is only used to declare the register file, create a type alias and use that. This sets you up nicely to make it a standalone type in the future.
I'm glad to see names for registers in Instruction, but maybe give them more meaningful names? Add is a great example - which registers are the inputs and which is the output?
You almost always want to derive Debug for a type. Copy and Clone are also very common. Implementing PartialEq will allow easier test writing.
There's a distinct lack of types and methods, overall the code feels pretty C-like. I'd associate functions with types, making them methods. For example, decode is the most obvious change to me.
Returning an Option from decode is iffy. Failing to decode an instruction doesn't seem like the absence of a value, it feels like a failure, which is normally reserved for Result.
When splitting match arms onto a different line from the pattern, use braces.
Consider inlining some of the instruction implementations into execute as many are only one line long anyway.
In a processor, it's very typical to automatically increment the program counter every instruction. Branch offsets take that automatic increment into account. This is nice here because it removes pc from all the non-branching methods. I just subtracted 1 when applying the jump, but it's probably better to fixup the offsets.
Create a type for sequences of instructions, I used Program. This will give you somewhere to hang fetch.

type RegisterFile = [u16; 3];
#[derive(Debug, Copy, Clone)]
enum Instruction {
 Halt,
 Load { reg: usize, value: u16 },
 Swap { reg1: usize, reg2: usize, reg3: usize },
 Add { reg1: usize, reg2: usize, reg3: usize },
 Branch { offset: usize }
}
impl Instruction {
 fn decode(encoded_instruction: u16) -> Option<Self> {
 let operator = encoded_instruction >> 12;
 let reg1 = ((encoded_instruction >> 8) & 0xF) as usize;
 let reg2 = ((encoded_instruction >> 4) & 0xF) as usize;
 let reg3 = (encoded_instruction & 0xF) as usize;
 let offset = (encoded_instruction & 0xFFF) as usize;
 let value = encoded_instruction & 0xFF;
 match operator {
 0 => Some(Instruction::Halt),
 1 => Some(Instruction::Load { reg: reg1, value: value }),
 2 => Some(Instruction::Swap { reg1: reg1, reg2: reg2, reg3: reg3 }),
 3 => Some(Instruction::Add { reg1: reg1, reg2: reg2, reg3: reg3 }),
 4 => Some(Instruction::Branch { offset: offset }),
 _ => None,
 }
 }
 fn execute(&self, registers: &mut [u16], pc: &mut usize) -> bool {
 match *self {
 Instruction::Load { reg, value } => {
 load(reg, value, registers);
 },
 Instruction::Swap { reg1, reg2, reg3 } => {
 swap(reg1, reg2, reg3, registers);
 },
 Instruction::Add { reg1, reg2, reg3 } => {
 add(reg1, reg2, reg3, registers);
 },
 Instruction::Branch { offset } => {
 branch(offset, pc);
 },
 Instruction::Halt => {
 halt(registers);
 return false;
 },
 }
 true
 }
}
fn halt(register_file: &[u16]) {
 println!("{:?}", register_file[0]);
}
fn load(register: usize, value: u16, register_file: &mut [u16]) {
 register_file[register] = value;
}
fn swap(reg1: usize, reg2: usize, reg3: usize, register_file: &mut [u16]) {
 register_file[reg3] = register_file[reg1];
 register_file[reg1] = register_file[reg2];
 register_file[reg2] = register_file[reg3];
}
fn add(reg1: usize, reg2: usize, reg3: usize, register_file: &mut [u16]) {
 register_file[reg3] = register_file[reg1] + register_file[reg2];
}
fn branch(offset: usize, pc: &mut usize) {
 *pc -= offset - 1;
}
struct Program<'a> {
 instructions: &'a [u16],
}
impl<'a> Program<'a> {
 fn fetch(&self, pc: usize) -> u16 {
 self.instructions[pc]
 }
}
fn cpu(program: Program) {
 let mut pc = 0;
 let mut registers = RegisterFile::default();
 loop {
 let encoded_instruction = program.fetch(pc);
 let decoded_instruction = Instruction::decode(encoded_instruction);
 match decoded_instruction {
 Some(instr) => {
 if !instr.execute(&mut registers, &mut pc) { break }
 }
 None => break,
 }
 pc += 1;
 }
}
fn main() {
 let encoded_instructions = Program { instructions: &[0x1110, 0x2100, 0x3010, 0x0] };
 cpu(encoded_instructions);
}

@shepmaster, this is fantastic, but there's one point I don't understand. You have impl<'a> Program<'a>, and I'm not clear on the syntax here. Are you saying that your guaranteeing that Program will be live for as long as the instructions are live, or the other way 'round, or what?
@ElfSternberg I'd highly recommend reading The Rust Programming Language, specifically the section about lifetimes (or maybe the related chapter in the in-progress revision). Lifetimes are a key component of what makes Rust such a great language, but a comment cannot explain them thoroughly enough. In this case, Program contains a reference to a slice of instructions and the compiler ensures that the slice will outlive Program; preventing invalid references.
I've read much of The Rust Programming Language book and have been working my way through Rust 101, and your explanation here was more clear and coherent than anything I'd read elsewhere. Lifetimes are one of the bugaboos of C++ and I appreciate the explanation.

Stack Exchange Network

A simple register VM written in Rust

1 Answer 1

Your Answer

Sign up or log in

Post as a guest

Post as a guest

Hot Network Questions

A simple register VM written in Rust

1 Answer 1

Your Answer

Sign up or log in

Post as a guest

Post as a guest

Related

Hot Network Questions