6502 Emulator in Rust

Question 1

This is my first attempt at writing a "large program", and I think that I've got the foundations for the emulator down. However, as I'm still a beginner, I might have made some questionable design choices (too many functions/too verbose/inefficient and so on), so I'd like to know if I can make the code more readable and/or efficient.

For now, I implemented only Immediate, Absolute, and Zero Page addressing modes, and only for LDA. I just waited to post this before going on and having to modify more code for a new design.

main.rs

 use cpu::CPU;
 
 mod cpu;
 mod instruction;
 mod flags;
 
 fn main() {
 let mut cpu = CPU::new();
 
 // values to load
 cpu.ram[0xA00B] = 0x0A;
 cpu.ram[0x00DD] = 0x80;
 
 // 0xA9 -> Load 0x80 into a
 // 0xAD -> Load content of 0xA00B into a
 // 0xA5 -> Load content of 0x00DD into a
 cpu.run(vec![0xA9, 0x80, 0xAD, 0x0B, 0xA0, 0xA5, 0xDD]);
 }

‎ cpu.rs

use crate::{instruction::{Instruction, AddrMode}, flags::{Flags, FLAGS}};
pub struct CPU {
 a: u8,
 p: Flags,
 pc: u16,
 pub ram: [u8; 0xffff],
 // unneeded, but it makes it harder to make mistakes + easier to format for a vector of states
 curr_instr: Instruction,
 // for printing the operative address in the log function
 op_addr: u16
}
impl CPU {
 pub fn new() -> Self {
 CPU {
 a: 0,
 p: Flags::new(),
 pc: 0,
 ram: [0; 0xffff],
 curr_instr: Instruction::new(),
 op_addr: 0
 }
 }
 // just for formatting in the print_state() function
 fn get_flag_str(&self, flag: FLAGS) -> &str {
 match flag {
 FLAGS::Z => {
 if self.p.z == true { return "T"; } else { return "F"; }
 }
 FLAGS::N => {
 if self.p.n == true { return "T"; } else { return "F"; }
 }
 _ => { return "U"; }
 }
 }
 pub fn print_state(&self) {
 println!("┏━━━━━━━━━━━━━━━━━━━━━┓");
 println!("┃ CPU STATE ┃");
 println!("┃ ┃");
 println!("┃ REGISTERS: ┃");
 println!("┃ A: {:#04X?} ┃", self.a);
 println!("┃ PC: {:#06X?} ┃", self.pc);
 println!("┃ ┃");
 println!("┃ FLAGS: ┃");
 println!("┃ C: {} ┃", self.get_flag_str(FLAGS::C));
 println!("┃ Z: {} ┃", self.get_flag_str(FLAGS::Z));
 println!("┃ I: {} ┃", self.get_flag_str(FLAGS::I));
 println!("┃ D: {} ┃", self.get_flag_str(FLAGS::D));
 println!("┃ V: {} ┃", self.get_flag_str(FLAGS::V));
 println!("┃ N: {} ┃", self.get_flag_str(FLAGS::N));
 println!("┃ ┃");
 println!("┃ INSTRUCTION: ┃");
 println!("┃ MNEMONIC: {} ┃", self.curr_instr.mnemonic);
 println!("┃ ADDR MODE: {} ┃", self.curr_instr.addr_mode.get_addr_mode_string(self.curr_instr.opcode));
 println!("┃ OP ADDR: {:#06X?} ┃", self.op_addr);
 println!("┃ OPCODE: {:#04X?} ┃", self.curr_instr.opcode);
 println!("┗━━━━━━━━━━━━━━━━━━━━━┛");
 }
 fn read_byte(&self, addr: u16) -> u8 {
 self.ram[addr as usize]
 }
 fn write_byte(&mut self, addr: u16, data: u8) {
 self.ram[addr as usize] = data
 }
 fn read_word(&self, addr: u16) -> u16 {
 let hi = self.read_byte(addr + 1);
 let lo = self.read_byte(addr);
 (hi as u16).wrapping_shl(8) | lo as u16
 }
 fn fetch_address(&mut self, mode: AddrMode) {
 match mode {
 AddrMode::IMM => {
 self.op_addr = self.pc + 1;
 self.pc += 2;
 }
 AddrMode::ABS => {
 self.op_addr = self.read_word(self.pc + 1);
 self.pc += 3;
 }
 AddrMode::ZPG => {
 self.op_addr = 0x0000 | (self.read_byte(self.pc + 1)) as u16;
 self.pc += 2;
 }
 }
 }
 fn execute(&mut self) {
 let opcode = self.read_byte(self.pc);
 self.curr_instr.fetch_instruction(opcode);
 self.fetch_address(self.curr_instr.addr_mode);
 match self.curr_instr.mnemonic.as_str() {
 "LDA" => {
 self.a = self.read_byte(self.op_addr);
 self.p.toggle_flags(self.a, vec![FLAGS::Z, FLAGS::N]);
 }
 "BRK" => {
 std::process::exit(0);
 }
 _ => { println!("Mnemonic unrecognized {}", self.curr_instr.mnemonic.as_str());}
 }
 }
 fn load(&mut self, program: Vec<u8>) {
 let mut addr = 0x0000;
 for data in program {
 self.write_byte(addr, data);
 addr += 1;
 }
 }
 pub fn run(&mut self, program: Vec<u8>) {
 self.load(program);
 loop {
 self.execute();
 self.print_state();
 }
 }
}

‎ flags.rs

pub enum FLAGS {
 C, Z, I, D, V, N
}
pub struct Flags {
 pub c: bool,
 pub z: bool,
 pub i: bool,
 pub d: bool,
 // b flag
 pub v: bool,
 pub n: bool
}
impl Flags {
 pub fn new() -> Self {
 Flags { c: false, z: false, i: false, d: false, v: false, n: false }
 }
 fn toggle_zero_flag(&mut self, value: u8) {
 if value == 0 {
 self.z = true;
 } else {
 self.z = false;
 }
 }
 fn toggle_negative_flag(&mut self, value: u8) {
 if value & 0b1000_0000 != 0 {
 self.n = true;
 } else {
 self.n = false;
 }
 }
 pub fn toggle_flags(&mut self, value: u8, flags: Vec<FLAGS>) {
 for flag in flags {
 match flag {
 FLAGS::Z => { self.toggle_zero_flag(value) }
 FLAGS::N => { self.toggle_negative_flag(value) }
 _ => { println!("Flag still not implemented") }
 }
 }
 }
}

‎ instruction.rs

#[derive(Copy, Clone)]
pub enum AddrMode {
 IMM,
 ABS,
 ZPG
}
impl AddrMode {
 pub fn get_addr_mode_string(&self, opcode: u8) -> String {
 match opcode {
 0xA9 | 0x00 => { String::from("IMM") }
 0xAD => { String::from("ABS") }
 0xA5 => { String::from("ZPG") }
 _ => { String::from("IMM") }
 }
 }
}
pub struct Instruction {
 pub opcode: u8,
 pub mnemonic: String,
 pub addr_mode: AddrMode
}
impl Instruction {
 pub fn new() -> Self {
 Instruction { opcode: 0x00, mnemonic: String::from("NOP"), addr_mode: AddrMode::IMM }
 }
 fn set_instruction(&mut self, new_opcode: u8, new_mnemonic: String, new_addr_mode: AddrMode) {
 self.opcode = new_opcode;
 self.mnemonic = new_mnemonic;
 self.addr_mode = new_addr_mode;
 }
 fn fetch_mnemonic(&self, opcode: u8) -> String {
 match opcode {
 0xA9 | 0xAD | 0xA5 => { String::from("LDA") }
 0x00 => { String::from("BRK") }
 _ => { todo!() }
 }
 }
 fn fetch_addr_mode(&self, opcode: u8) -> AddrMode {
 match opcode {
 0xA9 => { AddrMode::IMM }
 0xAD => { AddrMode::ABS }
 0xA5 => { AddrMode::ZPG }
 // TODO: should be implied
 0x00 => { AddrMode::IMM }
 _ => { todo!() }
 }
 }
 pub fn fetch_instruction(&mut self, opcode: u8) {
 let new_opcode = opcode;
 let new_mnemonic = self.fetch_mnemonic(opcode);
 let new_addr_mode = self.fetch_addr_mode(opcode);
 // set the current instruction with new data
 self.set_instruction(new_opcode, new_mnemonic, new_addr_mode);
 }
}

The instructions that I have implemented all work as intended, but I'm afraid the code isn't all that good :( Thanks in advance for the feedback

Question 2

Welcome to codereview! It looks overall quite nice, but clippy is unhappy here and there. Run cargo clippy in your project and work through the lints. This is something you should do habitually. Clippy wants to be your friend!

Question 3

This is an attempt at my answer. Since I am not familiar with the 6502, I can only comment on more obvious code smells.

I already left a comment under your post regarding clippy; I will therefore not reiterate its output here. I'm confident you can work through this yourself. :)

I must say that I am impressed -- for a beginner, this is already quite good code!

You have short and concise functions with descriptive names -- that's nice!

Naming

You have named quite some of structs with all-caps. The convention in rust is that structs and enums use SnakeCase. Seeing your main start with CPU::new() will confuse people who first see your code, since ALLCAPS is reserved for constants and statics.

This affects CPU, FLAGS, the fields of AddrMode, and likely more.

Incomplete abstraction, publicness

The field CPU.ram is public. Structs with public fields can be an indicator for incomplete abstraction.

You make use of the public field in main, to initialize the RAM. But you aleady have a nice write_byte method that you could use. Make that public, and then use it in main.

let mut cpu = Cpu::new();
// values to load
cpu.write_byte(0xA00B, 0x0A);
cpu.write_byte(0x00DD, 0x80);

Missing abstraction, primitive obsession, magic numbers

From what I can understand, your cpu.run gets a vector of instructions and address words. Apparently 0x80, 0xDD, and (0x0B, 0xA0) are words, and the other are instructions / opcodes. Your words also seem to be either 16 bit or 32 bit wide, but as a user I don't see when one or the other applies unless I already understand the architecture.

Providing the opcodes as u8 values is a missing abstraction, as well as primitive obession (using primities instead of value-objects). You actually want to allow only certain opcodes. I would suggest defining a pub enum OpCodes somewhere.

Now you would have a vector of opcodes and addresses, which is not good. But hey, we are missing another abstraction that can help!

Every opcode and address belong together and form ... something (I'm missing the right word -- would you call that an instruction?). So make a pub struct Instruction { opcode: OpCodes, data: u16 }. Feed those into your CPU.

This is also the point where you should deal with 16- and 32-bit words.

The method's signature should be this.

impl Cpu {
 pub fn run(&mut self, program: Vec<Instruction>) { todo!() }
}

Since the opcodes are also not used by any identifier, this makes them magic numbers.

fn get_flag_str can be written a lot shorter. You don't need to compare a bool to true, just use the bool. Also, booleans can be used in match expressions. And you don't need return statements in the last instruction of blocks and in match arms, so we can get rid of those.

impl Cpu {
 fn get_flag_str(&self, flag: Flags) -> &str {
 match flag {
 Flags::Z => match self.p.z {
 true => "T",
 false => "F",
 },
 Flags::N => match self.p.n {
 true => "T",
 false => "F",
 },
 _ => "U",
 }
 }
}

You now also see that you use the exact same logic twice to convert a bool to a str of length 1. That's a function.

impl Cpu {
 fn get_flag_str(&self, flag: FLAGS) -> &str {
 match flag {
 FLAGS::Z => bool_to_str(self.p.z),
 FLAGS::N => bool_to_str(self.p.n),
 _ => "U",
 }
 }
}
fn bool_to_str(value: bool) -> &'static str {
 match value {
 true => "T",
 false => "F",
 }
}

Furthermore, it seems you don't need really need a &str, since the result is always of length 1. Just use a char.

derive Debug

For libraries in Rust, it is convention that every struct and enum gets a Debug implementation. You have an application, but it still makes sense, because you can then easily throw a dbg! in your code and show the state of whatever object you are interested in. And luckily in Rust: you don't need to implement this yourself. Just change your declaration like this, and you are done.

#[derive(Debug)]
pub struct Cpu { ... }

formatting

Use the integrated formatter, rustfmt, to format your code. The moment you start collaborating with other programmers, you don't need to fight over indentation, whitespace, etc. Just use it, and everyone can read your code more easily since the formatting is closer than to what they would expect.

Missing tests

Your code does not include any tests. That is a problem, since after a refactor I cannot be sure that I did not break anything. This like read_word would likely be a good candidate to introduce some tests.

I think I will leave it here. There's probably more to be said when you dive into the actual implementation. I will leave that to others. :)

https://en.wikipedia.org/wiki/Magic_number_(programming)

Question 4

I don't think your recommendation to take what I'd call "decoded instructions" as an input makes sense: it does not match reality. The real CPU executes raw bytes, and that can matter. Code may reuse the same byte as different things, for example one (rarely used) trick is to use a non-taken branch to skip a one-byte instruction (that instruction would be interpreted as the branch offset that way, but as an instruction when jumped to directly). Also, branch offsets are in terms of bytes, not instructions. And self-modifying code would be difficult to support.

Question 5

Interesting point, and your insight in assembly is beyond my understanding. I think the question is what OP ultimately intends to do with this. Feed it some byte and check what the CPU does with that? Then I would keep it that way. But the intent is that it executes some operations on data. IMO code should signify intent. Of course, after passing the hypothetical Instruction vec to the CPU it would just convert it to bytes and place it into RAM. So we are just talking about a nicer interface. OP should maybe choose what makes most sense for his project.

Question 6

Thank you for the review :) I think that I will try to implement the Instruction struct actually, it sounds like it would be much easier to implement instructions in that way. Also I really only used u8 and u16 because (as far as I understand) that's all that the 6502 could work with, using u16 for addresses/program counter and u8 for everything else

sarema sarema 4892 silver badges10 bronze badges · Accepted Answer · 2022-09-09 14:05:36Z

This is an attempt at my answer. Since I am not familiar with the 6502, I can only comment on more obvious code smells.

I already left a comment under your post regarding clippy; I will therefore not reiterate its output here. I'm confident you can work through this yourself. :)

I must say that I am impressed -- for a beginner, this is already quite good code!

You have short and concise functions with descriptive names -- that's nice!

Naming

You have named quite some of structs with all-caps. The convention in rust is that structs and enums use SnakeCase. Seeing your main start with CPU::new() will confuse people who first see your code, since ALLCAPS is reserved for constants and statics.

This affects CPU, FLAGS, the fields of AddrMode, and likely more.

Incomplete abstraction, publicness

The field CPU.ram is public. Structs with public fields can be an indicator for incomplete abstraction.

You make use of the public field in main, to initialize the RAM. But you aleady have a nice write_byte method that you could use. Make that public, and then use it in main.

let mut cpu = Cpu::new();
// values to load
cpu.write_byte(0xA00B, 0x0A);
cpu.write_byte(0x00DD, 0x80);

Missing abstraction, primitive obsession, magic numbers

From what I can understand, your cpu.run gets a vector of instructions and address words. Apparently 0x80, 0xDD, and (0x0B, 0xA0) are words, and the other are instructions / opcodes. Your words also seem to be either 16 bit or 32 bit wide, but as a user I don't see when one or the other applies unless I already understand the architecture.

Providing the opcodes as u8 values is a missing abstraction, as well as primitive obession (using primities instead of value-objects). You actually want to allow only certain opcodes. I would suggest defining a pub enum OpCodes somewhere.

Now you would have a vector of opcodes and addresses, which is not good. But hey, we are missing another abstraction that can help!

Every opcode and address belong together and form ... something (I'm missing the right word -- would you call that an instruction?). So make a pub struct Instruction { opcode: OpCodes, data: u16 }. Feed those into your CPU.

This is also the point where you should deal with 16- and 32-bit words.

The method's signature should be this.

impl Cpu {
 pub fn run(&mut self, program: Vec<Instruction>) { todo!() }
}

Since the opcodes are also not used by any identifier, this makes them magic numbers.

fn get_flag_str can be written a lot shorter. You don't need to compare a bool to true, just use the bool. Also, booleans can be used in match expressions. And you don't need return statements in the last instruction of blocks and in match arms, so we can get rid of those.

impl Cpu {
 fn get_flag_str(&self, flag: Flags) -> &str {
 match flag {
 Flags::Z => match self.p.z {
 true => "T",
 false => "F",
 },
 Flags::N => match self.p.n {
 true => "T",
 false => "F",
 },
 _ => "U",
 }
 }
}

You now also see that you use the exact same logic twice to convert a bool to a str of length 1. That's a function.

impl Cpu {
 fn get_flag_str(&self, flag: FLAGS) -> &str {
 match flag {
 FLAGS::Z => bool_to_str(self.p.z),
 FLAGS::N => bool_to_str(self.p.n),
 _ => "U",
 }
 }
}
fn bool_to_str(value: bool) -> &'static str {
 match value {
 true => "T",
 false => "F",
 }
}

Furthermore, it seems you don't need really need a &str, since the result is always of length 1. Just use a char.

derive Debug

For libraries in Rust, it is convention that every struct and enum gets a Debug implementation. You have an application, but it still makes sense, because you can then easily throw a dbg! in your code and show the state of whatever object you are interested in. And luckily in Rust: you don't need to implement this yourself. Just change your declaration like this, and you are done.

#[derive(Debug)]
pub struct Cpu { ... }

formatting

Use the integrated formatter, rustfmt, to format your code. The moment you start collaborating with other programmers, you don't need to fight over indentation, whitespace, etc. Just use it, and everyone can read your code more easily since the formatting is closer than to what they would expect.

Missing tests

Your code does not include any tests. That is a problem, since after a refactor I cannot be sure that I did not break anything. This like read_word would likely be a good candidate to introduce some tests.

I think I will leave it here. There's probably more to be said when you dive into the actual implementation. I will leave that to others. :)

https://en.wikipedia.org/wiki/Magic_number_(programming)

I don't think your recommendation to take what I'd call "decoded instructions" as an input makes sense: it does not match reality. The real CPU executes raw bytes, and that can matter. Code may reuse the same byte as different things, for example one (rarely used) trick is to use a non-taken branch to skip a one-byte instruction (that instruction would be interpreted as the branch offset that way, but as an instruction when jumped to directly). Also, branch offsets are in terms of bytes, not instructions. And self-modifying code would be difficult to support.
Interesting point, and your insight in assembly is beyond my understanding. I think the question is what OP ultimately intends to do with this. Feed it some byte and check what the CPU does with that? Then I would keep it that way. But the intent is that it executes some operations on data. IMO code should signify intent. Of course, after passing the hypothetical Instruction vec to the CPU it would just convert it to bytes and place it into RAM. So we are just talking about a nicer interface. OP should maybe choose what makes most sense for his project.
Thank you for the review :) I think that I will try to implement the Instruction struct actually, it sounds like it would be much easier to implement instructions in that way. Also I really only used u8 and u16 because (as far as I understand) that's all that the 6502 could work with, using u16 for addresses/program counter and u8 for everything else

Stack Exchange Network

6502 Emulator in Rust

1 Answer 1

Your Answer

Sign up or log in

Post as a guest

Post as a guest

Hot Network Questions

6502 Emulator in Rust

1 Answer 1

Your Answer

Sign up or log in

Post as a guest

Post as a guest

Related

Hot Network Questions