How to make a synthesizable Instruction Memory in SystemVerilog?

Question 1

I am designing a MIPS processor on SystemVerilog. The instruction memory is made like this:

module instr_mem
#(
 parameter SIZE = 7 //64
)
(
 input logic rst_n, 
 input logic clk,
 
 input logic [31:0] addr,
 output logic [31:0] rd
);
logic [31:0] rom [0 : SIZE-1];
assign rd = rom[addr];
initial begin
 rom <= {
 32'h0,
 32'h2408000F, // a = F
 32'h240A0000, // res = 0
 32'h01485021, // (*) res = res + a
 32'h2508FFFF, // a = a - 1
 32'h1500FFFD, // if (a != 0) goto (*)
 32'hAC0A0ADD
 };
end
endmodule

But this code doesn't synthesizes as a 'black box'/separate module etc., the Top Module looks like: "Core, Data Memory, < a big mess of something instead of Instuction Memory >". I tried to make IM combinational (assign rom = {..}), but synthesis goes the same way.

How can I make a synthesizable Instruction Memory? Btw, is there any guides to write surely synthesizable code?

Question 2

Is this on an FPGA?

Question 3

@mitu-raj I want to run this on FPGA too, but now I just use Vivado "Run Synthesis" and IM isn't shown as a separate module, but as.. something big and strange. I thought it should be synthesizable in this tool before attempting to run this on FPGA.

Question 4

This ZipCPU link was a great help to me. In particular, you might look at his Minimizing FPGA Resource Utilization page with with a focus on his discussion about memory and the ZipCPU development and "Use block RAM anywhere you can" on that page. Actually, pretty much everything he writes is worth a read.

Question 5

@jonk I thought I was the only one. His every article is worth a read as it quite simplifies the complex stuffs.👍

Question 6

@MituRaj I encountered his page in August, last year. Someone on a Xilinx forum page referred me there. I could not believe my eyes. It was the purest of gold.

Question 7

First of all, does it really matter what the internal implementation looks like? As long as the behavior of the module at its ports is correct, what difference does it make?

As you put more code into the rom, the synthesis tools will probably decide at some point to turn it into an actual memory (e.g., BRAM on an FPGA).

But in practice, you'll quickly get tired of editing your software in the RTL source code. Instead, you'll want to create (or get) an assembler or compiler, and use an external file to initialize the contents of the rom from the output of that. Different tools will have their own ways to accomplish this.¹ This will also eliminate the need to re-synthesize the hardware each time you want to tweak the software.

¹ You mentioned "Vivado" in a comment, so I'll assume you're targeting Xilinx FPGAs. I once created a custom CPU for a Xilinx project, and I was able to adapt a "universal assembler" written in Ruby to generate .coe ("coefficient") files for the instruction memory so that it could be synthesized. I also wrote a separate script that converted the .coe file into Verilog source code for simulation in Modelsim.

Question 8

From an efficient resource usage point of view, it is a good practice to use embedded block RAM of FPGA instead of logic resources when it is possible. This will "result in more compact and higher performing designs" (see Maximize Block RAM Performance text from Xilinx).

Unless you are using IP blocks to instantiate a memory, a compiler has to recognize a block memory in your hdl description and you have to provide an hdl code, which will be recognizable. It is called RAM inference capabilities and a synthesis tool should come with corresponding coding examples.

Altera (Intel FPGA): Here is an Example 13–29. Verilog HDL Synchronous ROM from Inferring Memory Functions from HDL Code.

module sync_rom (clock, address, data_out);
 input clock;
 input [7:0] address;
 output [5:0] data_out;
 reg [5:0] data_out;
 
always @ (posedge clock)
begin
 case (address)
 8'b00000000: data_out = 6'b101111;
 8'b00000001: data_out = 6'b110110;
 ...
 8'b11111110: data_out = 6'b000001;
 8'b11111111: data_out = 6'b101010;
 endcase
end
endmodule

Another example from that text gives a Verilog code using readmemb.

Xilinx: an example of ROM Using Block RAM Resources

module rams_sp_rom_1 (clk, en, addr, dout);
 input clk;
 input en;
 input [5:0] addr;
 output [19:0] dout;
 (*rom_style = "block" *) reg [19:0] data;
always @(posedge clk)
begin
 if (en)
 case(addr)
 6'b000000: data <= 20'h0200A; 6'b100000: data <= 20'h02222;
 6'b000001: data <= 20'h00300; 6'b100001: data <= 20'h04001;
 ...
 6'b011111: data <= 20'h00102; 6'b111111: data <= 20'h0400D;
 endcase
end
assign dout = data;
endmodule

Take a note of a synthesis attribute (*rom_style = "block" *), which instructs the synthesis tool how to infer ROM memory.

Lattice: Look for "Inferring ROM" section in iCEcube2 User Guide.

When you follow the coding guidelines, the synthesis tool should successfully infer the block memory.

Question 9

Thanks for the answer. I've made it as the 2nd example and it caused a problem: Insruction in CPU has one cycle delay from the Program Counter. Thus, when branch instruction (number i) come, the Program Counter is already (i+1) and then come instr (i+1) and PC is PC_branch. How can I fix it? Delay is caused by this ff in IM.

Question 10

@katzesaal But did it infer the block memory? I am not very familiar with embedded memory in Xilinx devices, but I assume that it can't work without clock and will inevitably give that one clock cycle delay. You should definitely check the documentation. Or you can simply check, whether the block memory in your target device supports the desired memory mode, by trying to implement it using IP memory function. Anyway, this looks like a good another question, and there is a chance that you will be unable to implement it like this and will have to adjust your pipeline.

Question 11

@katzesaal That sounds like a completely different scenario. Hence, please ask it as a separate question. This answer covered what you asked for in the question.

Dave Tweed Dave Tweed 184k17 gold badges248 silver badges431 bronze badges · Answer 1 · 2021-05-26 15:34:28Z

First of all, does it really matter what the internal implementation looks like? As long as the behavior of the module at its ports is correct, what difference does it make?

As you put more code into the rom, the synthesis tools will probably decide at some point to turn it into an actual memory (e.g., BRAM on an FPGA).

But in practice, you'll quickly get tired of editing your software in the RTL source code. Instead, you'll want to create (or get) an assembler or compiler, and use an external file to initialize the contents of the rom from the output of that. Different tools will have their own ways to accomplish this.¹ This will also eliminate the need to re-synthesize the hardware each time you want to tweak the software.

¹ You mentioned "Vivado" in a comment, so I'll assume you're targeting Xilinx FPGAs. I once created a custom CPU for a Xilinx project, and I was able to adapt a "universal assembler" written in Ruby to generate .coe ("coefficient") files for the instruction memory so that it could be synthesized. I also wrote a separate script that converted the .coe file into Verilog source code for simulation in Modelsim.

megasplash megasplash 4121 gold badge4 silver badges12 bronze badges · Answer 2 · 2021-05-27 11:31:22Z

From an efficient resource usage point of view, it is a good practice to use embedded block RAM of FPGA instead of logic resources when it is possible. This will "result in more compact and higher performing designs" (see Maximize Block RAM Performance text from Xilinx).

Unless you are using IP blocks to instantiate a memory, a compiler has to recognize a block memory in your hdl description and you have to provide an hdl code, which will be recognizable. It is called RAM inference capabilities and a synthesis tool should come with corresponding coding examples.

Altera (Intel FPGA): Here is an Example 13–29. Verilog HDL Synchronous ROM from Inferring Memory Functions from HDL Code.

module sync_rom (clock, address, data_out);
 input clock;
 input [7:0] address;
 output [5:0] data_out;
 reg [5:0] data_out;
 
always @ (posedge clock)
begin
 case (address)
 8'b00000000: data_out = 6'b101111;
 8'b00000001: data_out = 6'b110110;
 ...
 8'b11111110: data_out = 6'b000001;
 8'b11111111: data_out = 6'b101010;
 endcase
end
endmodule

Another example from that text gives a Verilog code using readmemb.

Xilinx: an example of ROM Using Block RAM Resources

module rams_sp_rom_1 (clk, en, addr, dout);
 input clk;
 input en;
 input [5:0] addr;
 output [19:0] dout;
 (*rom_style = "block" *) reg [19:0] data;
always @(posedge clk)
begin
 if (en)
 case(addr)
 6'b000000: data <= 20'h0200A; 6'b100000: data <= 20'h02222;
 6'b000001: data <= 20'h00300; 6'b100001: data <= 20'h04001;
 ...
 6'b011111: data <= 20'h00102; 6'b111111: data <= 20'h0400D;
 endcase
end
assign dout = data;
endmodule

Take a note of a synthesis attribute (*rom_style = "block" *), which instructs the synthesis tool how to infer ROM memory.

Lattice: Look for "Inferring ROM" section in iCEcube2 User Guide.

When you follow the coding guidelines, the synthesis tool should successfully infer the block memory.

Thanks for the answer. I've made it as the 2nd example and it caused a problem: Insruction in CPU has one cycle delay from the Program Counter. Thus, when branch instruction (number i) come, the Program Counter is already (i+1) and then come instr (i+1) and PC is PC_branch. How can I fix it? Delay is caused by this ff in IM.
@katzesaal But did it infer the block memory? I am not very familiar with embedded memory in Xilinx devices, but I assume that it can't work without clock and will inevitably give that one clock cycle delay. You should definitely check the documentation. Or you can simply check, whether the block memory in your target device supports the desired memory mode, by trying to implement it using IP memory function. Anyway, this looks like a good another question, and there is a chance that you will be unable to implement it like this and will have to adjust your pipeline.
@katzesaal That sounds like a completely different scenario. Hence, please ask it as a separate question. This answer covered what you asked for in the question.

Stack Exchange Network

How to make a synthesizable Instruction Memory in SystemVerilog?

2 Answers 2

Your Answer

Sign up or log in

Post as a guest

Post as a guest

Linked

Hot Network Questions

How to make a synthesizable Instruction Memory in SystemVerilog?

2 Answers 2

Your Answer

Sign up or log in

Post as a guest

Post as a guest

Linked

Related

Hot Network Questions