We are doing a project designing a 5 staged pipelined CPU on RISC-V ISA, when designing the hazard detection unit and forwarding unit, instead of using the common datapath design, we design like this:
- When doing Instruction fetch, fetch 3 instructions instead of one to "peek" if there's need to forwarding or store.
- Since it would need to fetch three instructions from RAM, at start of a program, it would take 3 cycles to fetch three instructions. Then we store them at three registers. Then next IF stage, we would only need to fetch 1 instruction since the other two are already stored in the register.
- We can decide control signals for the instructions at the decoder and they are passed to the next stage.
We are using FPGA to implement the CPU, I wonder if there are any deficts in this design.
-
1\$\begingroup\$ Please clarify your specific problem or provide additional details to highlight exactly what you need. As it's currently written, it's hard to tell exactly what you're asking. \$\endgroup\$Community– Community Bot2024年05月16日 05:34:43 +00:00Commented May 16, 2024 at 5:34
1 Answer 1
You titled the question "Problem with 5-staged pipeline CPU design". Yet you don't describe any problems, just a hypothetical design, without providing any information of what the specifications for the core are. All you've said is that it should be a 5-stage pipelined implementation. Presumably this implies that on straight-line code it should retire one instruction per cycle. So, if that's all you need to comply with, then as long as you retire at that speed, you're OK with any design :)
I wonder if there are any deficits in this design.
It depends on what the specifications of the processor you are designing are.
There are only deficits if your design doesn't meet the specifications and criteria of the assignment. And you didn't share those with us.
at start of a program, it would take 3 cycles to fetch three instructions
To have your logic work, there always need to be at least 3 instructions in the prefetch buffer. Anytime the buffer empties, there will be a latency to load it. You can make the memory interface 4 words wide, and then filling the prefetch buffer from empty will take just one cycle. But, again, this depends on what you are allowed to do.
If the application for your design is a "fast embedded" MCU-style core, then having a 128-bit bus interface is not too preposterous. You'll have enough RAM blocks in a decent FPGA to pull it off.
-
\$\begingroup\$ Since it's troublesome to create a three-read-port RAM (we are using Vivado Ip core and maximum it's 2), we only fetch 3 instructions at the beginning, which would take 3 cycles, then buffer is used to store the 2 instrutions prepared for next cycle. So we need only single port RAM because the other two instructions have already been stored in the last cycle. \$\endgroup\$Wells– Wells2024年05月16日 04:56:20 +00:00Commented May 16, 2024 at 4:56
-
1\$\begingroup\$ You can always create more read ports by using multiple copies of the RAM, writing the same data to all copies. \$\endgroup\$Dave Tweed– Dave Tweed2024年05月16日 12:51:52 +00:00Commented May 16, 2024 at 12:51