addiu $6,$6,5
bltz $6,$L5
nop
...
$L5:
Is that safe on MIPS I? If so, how?
Original MIPS I is a classic 5-stage RISC IF ID EX MEM WB
design that hides all of its branch latency with a single branch-delay slot by checking branch conditions early, in the ID stage. (Which is why it's limited to equal/not-equal, or sign-bit checks like lt or ge zero, not lt between two registers that would need carry-propagation through an adder.)
Doesn't this mean that branches need their input ready a cycle earlier than ALU instructions? The bltz
enters the ID stage in the same cycle that addiu
enters EX.
MIPS I (aka R2000) uses bypass forwarding from EX-output to EX-input so normal integer ALU instructions (like a chain of addu
/xor
) have single-cycle latency and can run in consecutive cycles.
MIPS stands for "Microprocessor without Interlocked Pipeline Stages", so it doesn't detect RAW hazards; code has to avoid them. (Hence load-delay slots on first-gen MIPS, with MIPS II adding interlocks to stall in that case, invalidating the acronym :P).
But I never see any discussion of calculating the branch condition multiple instructions ahead to avoid a stall. (The addiu/bltz example was emitted by MIPS gcc5.4 -O3 -march=mips1
on Godbolt, which does respect load-delay slots, filling with nop
if needed.)
Does it use some kind of trick like EX reading inputs on the falling edge of the clock, and ID not needing forwarded register values until the rising edge? (With EX producing its results early enough for that to work)
I guess that would make sense if the clock speed is capped low enough for cache access to be single-cycle.
Stalling or bubble in MIPS claims that lw
+ a beq
on the load result needs 2 stall cycles because it can't forward. That's not accurate for actual MIPS I (unless gcc is buggy). It does mention half clock cycles, though, allowing a value to be written and then read from the register file in the same whole cycle.
You are actually asking two questions:
- Is that safe on MIPS I?
- If so, how?
Is that safe on MIPS I?
I have seen different block diagrams of MIPS CPUs. Most of them perform the branch decision in the EX
or even in the MEM
stage instead of the ID
stage.
Of course such designs will react differently when your example code is executed.
Without an official statement from the CPU manual of the CPU you are really using, your question cannot be answered with certainty.
(Paul Clayton's answer on Is that true if we can always fill the delay slot there is no need for branch prediction? agrees that one delay slot does fully hide branch latency on MIPS R2000, but not MIPS R4000. So that's good evidence that real commercial MIPS CPUs work the way the question assumes, despite the existence of various implementations that might not exactly follow the MIPS ISA.)
If so, how?
Doesn't this mean that branches need their input ready a cycle earlier than ALU instructions?
No.
The key is the bypass forwarding logic. Let's take a look at the following example:
add $A, $B, $C ; Currently in MEM stage
or $D, $E, $F ; Currently in EX stage
bltz $G, someLabel ; Currently in ID stage
(While A
, B
, ... G
are GPR numbers.)
The bypass forwarding logic for the EX phase (or
instruction) contains a multiplexer that works the following way (pseudo code):
if E = A
take ALU input from EX/MEM shift register output
else
take ALU input from ID/EX shift register output
end-if
It is this multiplexer which allows you to use the result of some instruction (add
) in the following one (or
).
Of course the same can be done for the ID
phase using a 3-way multiplexer:
if G = D
take branch decision input from ALU output
else if G = A
take branch decision input from EX/MEM shift register output
else
take branch decision input from register bank output
end-if
Doing this, the signal propagation time will increase by the time needed in the EX
phase. This means that this will limit the clock frequency of the processor.
However, the result of some instruction can already be used in the ID
stage of the next instruction without needing an additional clock cycle.
来源:https://stackoverflow.com/questions/56586551/how-does-mips-i-forward-from-ex-to-id-for-branches-without-stalling