1
\$\begingroup\$

We are doing a project designing a 5 staged pipelined CPU on RISC-V ISA, when designing the hazard detection unit and forwarding unit, instead of using the common datapath design, we design like this:

  • When doing Instruction fetch, fetch 3 instructions instead of one to "peek" if there's need to forwarding or store.
  • Since it would need to fetch three instructions from RAM, at start of a program, it would take 3 cycles to fetch three instructions. Then we store them at three registers. Then next IF stage, we would only need to fetch 1 instruction since the other two are already stored in the register.
  • We can decide control signals for the instructions at the decoder and they are passed to the next stage.

We are using FPGA to implement the CPU, I wonder if there are any deficts in this design.

toolic
10.8k11 gold badges31 silver badges35 bronze badges
asked May 16, 2024 at 4:00
\$\endgroup\$
1
  • 1
    \$\begingroup\$ Please clarify your specific problem or provide additional details to highlight exactly what you need. As it's currently written, it's hard to tell exactly what you're asking. \$\endgroup\$ Commented May 16, 2024 at 5:34

1 Answer 1

1
\$\begingroup\$

You titled the question "Problem with 5-staged pipeline CPU design". Yet you don't describe any problems, just a hypothetical design, without providing any information of what the specifications for the core are. All you've said is that it should be a 5-stage pipelined implementation. Presumably this implies that on straight-line code it should retire one instruction per cycle. So, if that's all you need to comply with, then as long as you retire at that speed, you're OK with any design :)

I wonder if there are any deficits in this design.

It depends on what the specifications of the processor you are designing are.

There are only deficits if your design doesn't meet the specifications and criteria of the assignment. And you didn't share those with us.

at start of a program, it would take 3 cycles to fetch three instructions

To have your logic work, there always need to be at least 3 instructions in the prefetch buffer. Anytime the buffer empties, there will be a latency to load it. You can make the memory interface 4 words wide, and then filling the prefetch buffer from empty will take just one cycle. But, again, this depends on what you are allowed to do.

If the application for your design is a "fast embedded" MCU-style core, then having a 128-bit bus interface is not too preposterous. You'll have enough RAM blocks in a decent FPGA to pull it off.

answered May 16, 2024 at 4:08
\$\endgroup\$
2
  • \$\begingroup\$ Since it's troublesome to create a three-read-port RAM (we are using Vivado Ip core and maximum it's 2), we only fetch 3 instructions at the beginning, which would take 3 cycles, then buffer is used to store the 2 instrutions prepared for next cycle. So we need only single port RAM because the other two instructions have already been stored in the last cycle. \$\endgroup\$ Commented May 16, 2024 at 4:56
  • 1
    \$\begingroup\$ You can always create more read ports by using multiple copies of the RAM, writing the same data to all copies. \$\endgroup\$ Commented May 16, 2024 at 12:51

Your Answer

Draft saved
Draft discarded

Sign up or log in

Sign up using Google
Sign up using Email and Password

Post as a guest

Required, but never shown

Post as a guest

Required, but never shown

By clicking "Post Your Answer", you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.