<- previous index next ->

Lecture 19, Pipelining Data Forwarding

 Data forwarding example CMSC 411 architecture
 Consider the five stage pipeline architecture:
 IF instruction fetch, PC is address into memory fetching instruction
 ID instruction decode and register read out of two values
 EX execute instruction or compute data memory address
 M data memory access to store or fetch a data word
 WB write back value into general register
 IF ID EX M WB
 +--+ +--+ +--+ +--+ +--+
 | | | | | A|-|\ | | | |
 | | | | /---| | \ \_| | | |
 |PC|-(I)-|IR|-(R) = | | / / | |-(D)-| |--+
 | | | | ^ \---| B|-|/ | | | | |
 +--+ +--+ | +--+ +--+ +--+ |
 ^ ^ | ^ ALU ^ ^ |
 | | | | | | |
 clk-+--------+-----------+--------+--------+ |
 | |
 +-----------------------------+
 Now consider the instruction sequence:
 400 lw 1,100ドル(0ドル) load general register 1 from memory location 100
 404 lw 2,104ドル(0ドル) load general register 2 from memory location 104
 408 nop
 40C nop wait for register 2ドル to get data
 410 add 3,ドル1,ドル2ドル add contents of registers 1 and 2, sum into register 3
 414 nop
 418 nop wait for register 3ドル to get data
 41C add 4,ドル3,ドル1ドル add contents of registers 3 and 1, sum into register 4
 420 nop
 424 nop wait for register 4ドル to get data
 428 beq 3,ドル4,ドル-100 branch if contents of register 3 and 4 are equal to 314
 42C add 4,ドル4,ドル4ドル add ..., this is the "delayed branch slot" always exec.
 The pipeline stage table with NO data forwarding is:
 lw IF ID EX M WB
 lw IF ID EX M WB
 nop IF ID EX M WB
 nop IF ID EX M WB
 add IF ID EX M WB
 nop IF ID EX M WB
 nop IF ID EX M WB
 add IF ID EX M WB
 nop IF ID EX M WB
 nop IF ID EX M WB
 beq IF ID EX M WB
 add IF ID EX M WB
 time 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16
This can be significantly improved with the addition of four
multiplexors and wiring.
 IF ID EX M WB
 +--+ +--+ +--+ +--+ +--+
 | | | | | A|-(X)--|\ | | | |
 | | | | /-(X)--| | | | \ \_| | | |
 |PC|-(I)-|IR|-(R) | = | | | | / / | |-+-(D)-| |--+
 | | | | ^ \-(X)--| B|-(X)--|/ | | | | | |
 +--+ +--+ | | +--+ | | +--+ | +--+ |
 ^ ^ | | ^ | | ALU ^ | ^ |
 | | | | | | | | | | |
 clk-+--------+--------------+-------------+----------+ |
 | | | | | |
 | +----------+-----------+ |
 | | |
 +-------------+-------------------------+
 The pipeline stage table with data forwarding is:
 lw IF ID EX M WB
 lw IF ID EX M WB
 nop IF ID EX M WB saved one nop
 add IF ID EX M WB 2ドル in WB and used in EX
 add IF ID EX M WB saved two nop's 3ドル used
 nop IF ID EX M WB saved one nop 
 beq IF ID EX M WB 4ドル in MEM and used in ID
 add IF ID EX M WB 
 time 1 2 3 4 5 6 7 8 9 10 11 12
 Note the required nop from using data immediately after a load.
 Note the required nop for the beq in the ID stage using an ALU result.
The data forwarding paths are shown in green with the additional
multiplexors. The control is explained below.
Green must be added to part2a.vhdl.
Blue already exists, used for discussion, do not change.
To understand the logic better, note that MEM_RD contains the register
destination of the output of the ALU and MEM_addr contains the value
of the output of the ALU for the instruction now in the MEM stage.
If the instruction in the EX stage has the MEM_RD destination in
bits 25 downto 21, then MEM_addr must be routed to the A side of the ALU.
(This is the A forward MEM_addr control signal.)
 EX stage MEM stage
 add 4,ドル3,ドル1ドル add 3,ドル1,ドル2ドル
 | |
 +---------------+
If the instruction in the EX stage has the MEM_RD destination in
bits 20 downto 16, then MEM_addr must be routed to the B side of the ALU.
(This is the B forward MEM_addr control signal.)
 EX stage MEM stage
 add 4,ドル1,ドル3ドル add 3,ドル1,ドル2ドル
 | |
 +------------+
To understand the logic better, note that WB_RD contains the register
destination of the output of the ALU or Memory and WB_result contains
the value of the output of the ALU or Memory for the instruction now
in the WB stage.
If the instruction in the EX stage has the WB_RD destination in
bits 25 downto 21, then WB_result must be routed to the A side of the ALU.
(This is the A forward WB_result control signal.)
If the instruction in the EX stage has the WB_RD destination in
bits 20 downto 16, then WB_result must be routed to the B side of the ALU.
(This is the B forward WB_result control signal.)
Note that a beq instruction in the ID stage that needs a value from
the instruction in the WB stage does not need data forwarding.
A beq instruction in the ID stage has the MEM_RD destination in
bits 25 downto 21, then MEM_addr must be routed to the top side of
the equal comparator.
(This is the 1 forward control signal.)
A beq instruction in the ID stage has the MEM_RD destination in
bits 20 downto 16, then MEM_addr must be routed to the bottom side of
the equal comparator.
(This is the 2 forward control signal.)
 ID stage EX stage MEM stage
 beq 3,ドル4,ドル-100 nop add 4,ドル3,ドル1ドル
 | |
 +----------------------------+
A beq instruction in the ID stage has the WB_RD destination in
bits 20 downto 16, then WB_result must be used by the bottom side of
the equal comparator.
(This happens by magic. Not really, two rules above apply.)
 ID stage EX stage MEM stage WB stage
 beq 3,ドル4,ドル-100 nop nop lw 4,8ドル(3ドル)
 | |
 +-------------------------------------+
 The data forwarding rules can be summarized based on the
 cs411 schematic, shown above.
 ID stage beq data forwarding: 
 default with no data forwarding is ID_read_data_1 
 1 forward MEM_addr is ID_reg1=MEM_RD and MEM_rd/=0 and MEM_OP/=lw 
 
 default with no data forwarding is ID_read_data_2
 2 forward MEM_addr is ID_reg2=MEM_RD and MEM_rd/=0 and MEM_OP/=lw 
 EX stage data forwarding:
 default with no data forwarding is EX_A
 A forward MEM_addr is EX_reg1=MEM_RD and MEM_RD/=0 and MEM_OP/=lw
 A forward WB_result is EX_reg1=WB_RD and WB_RD/=0
 default with no data forwarding is EX_B
 B forward MEM_addr is EX_reg2=MEM_RD and MEM_RD/=0 and MEM_OP/=lw
 B forward WB_result is EX_reg2=WB_RD and WB_RD/=0
 Note: the entity mux32_3 is designed to handle the above.
 ID_RD is 0 for ID_OP= beq, j, sw (nop, all zeros, automatic zero in RD)
 thus EX_RD, MEM_RD, WB_RD = 0 for these instructions
 Because register zero is always zero, we can use 0 for
 a destination for every instruction that does not
 produce a result in a register. Thus no data forwarding
 will occur for instructions that do not produce a value
 in a register.
 note: ID_reg1 is ID_IR(25 downto 21)
 ID_reg2 is ID_IR(20 downto 16)
 EX_reg1 is EX_IR(25 downto 21)
 EX_reg2 is EX_IR(20 downto 16)
 MEM_OP is MEM_IR(31 downto 26)
 EX_OP is EX_IR(31 downto 26)
	ID_OP is ID_IR(31 downto 26)
 These shorter names can be used with VHDL alias statements
 alias ID_reg1 : word_5 is ID_IR(25 downto 21);
 alias ID_reg2 : word_5 is ID_IR(20 downto 16);
 alias EX_reg1 : word_5 is EX_IR(25 downto 21);
 alias EX_reg2 : word_5 is EX_IR(20 downto 16);
 alias MEM_OP : word_6 is MEM_IR(31 downto 26);
 alias EX_OP : word_6 is EX_IR(31 downto 26);
 alias ID_OP : word_6 is ID_IR(31 downto 26);
Why is the priority mux, mux32_3 needed?
mux32_3.vhdl gives priority to ct1 over ct2
Answer: Consider MEM_RD with a destination value 3 and
WB_RD with a destination value 3.
What should add 4,ドル3,ドル3ドル use? MEM_addr or WB_result ?
For this to happen, some program or some person would have
written code such as:
 sub 3,ドル12,ドル11ドル
 add 3,ドル1,ドル2ドル
 add 4,ドル3,ドル3ドル double the value of 3ドル
Well, rather obviously, the result of the sub is never used and
thus the answer to our question is that MEM_addr must be used. This
is the closest prior instruction with the required result. The
correct design is implemented using the priority mux32_3 with the
MEM_addr in the in1 priority input.
The control signal A forward MEM_addr may be implemented in VHDL as:
btw: 100011 in any_IR(31 downto 26) is the lw opcode in this example,
 be sure to check this semesters cs411_opcodes.txt
Here is where you may want to add a debug process. Replace AFMA
with any signal name of interest:
 prtAFMA: process (AFMA)
 variable my_line : LINE; -- my_line needs to be defined
 begin
 write(my_line, string'("AFMA="));
 write(my_line, AFMA); -- or hwrite for long signals
 write(my_line, string'(" at="));
 write(my_line, now); -- "now" is simulation time
 writeline(output, my_line); -- outputs line
 end process prtAFMA;
part2a.chk has the _RD signals and values
cs411_opcodes.txt for op code values
Now, to finish part2a.vhdl, the jump and branch instructions must be
implemented. This is shown in green on the upper part of the schematic.
The signal out of the jump address box would be coded in VHDL as:
jump_addr <= PCP(31 downto 28) & ID_IR(25 downto 0) & "00"; The adder symbol is just another instance of your Homework 4, add32. The "shift left 2" is a simple VHDL statement: shifted2 <= ID_sign_ext(29 downto 0) & "00"; The project writeup: part2a
For more debugging, uncommment print process and diff against:
part2a_print.chk
part2a_print.chkg
 <- previous index next ->

Other links

Go to top

AltStyle によって変換されたページ (->オリジナル) /