SlideShare a Scribd company logo
ECE 4100/6100
Advanced Computer Architecture
Lecture 1 Pipelining (3055 Review)
Prof. Hsien-Hsin Sean Lee
School of Electrical and Computer Engineering
Georgia Institute of Technology
Pipeline Stage
Combinational
Logic
F/F
F/F
• Optimal FO4 per pipe
– 6 to 8 [UT/Compaq, ISCA-29]
– 18 (15+3 latch) [IBM, MICRO-35]
P4 pipe stage~ 16 FO4
1 FO4
Five-stage Pipelined Datapath
Instruction
memory
Address
4
32
0
Add
Add
result
Shift
left 2
Instruction
IF/ID EX/MEM MEM/WB
M
u
x
0
1
Add
PC
0
Write
data
M
u
x
1
Registers
Read
data 1
Read
data 2
Read
register 1
Read
register 2
16
Sign
extend
Write
register
Write
data
Read
data
1
ALU
result
M
u
x
ALU
Zero
ID/EX
Data
memory
Address
Inst. Fetch Inst. Decode Exec Mem WB
Example for lw instruction:
Instruction Fetch (IF)
Instruction
memory
Address
4
32
0
Add
Add
result
Shift
left 2
Instruction
IF/ID EX/MEM MEM/WB
M
u
x
0
1
Add
PC
0
Write
data
M
u
x
1
Registers
Read
data1
Read
data2
Read
register 1
Read
register 2
16
Sign
extend
Write
register
Write
data
Read
data
1
ALU
result
M
u
x
ALU
Zero
ID/EX
Data
memory
Address
Instruction fetch
Example for lw instruction:
Instruction Decode (ID)
Instruction
memory
Address
4
32
0
Add
Add
result
Shift
left 2
Instruction
IF/ID EX/MEM MEM/WB
M
u
x
0
1
Add
PC
0
Write
data
M
u
x
1
Registers
Read
data1
Read
data2
Read
register 1
Read
register 2
16
Sign
extend
Write
register
Write
data
Read
data
1
ALU
result
M
u
x
ALU
Zero
ID/EX
Data
memory
Address
Instruction decode
Example for lw instruction: Execution (EX)
Instruction
memory
Address
4
32
0
Add
Add
result
Shift
left 2
Instruction
IF/ID EX/MEM MEM/WB
M
u
x
0
1
Add
PC
0
Write
data
M
u
x
1
Registers
Read
data1
Read
data2
Read
register 1
Read
register 2
16
Sign
extend
Write
register
Write
data
Read
data
1
ALU
result
M
u
x
ALU
Zero
ID/EX
Data
memory
Address
Execution
Example for lw instruction: Memory (MEM)
Instruction
memory
Address
4
32
0
Add
Add
result
Shift
left 2
Instruction
IF/ID EX/MEM MEM/WB
M
u
x
0
1
Add
PC
0
Write
data
M
u
x
1
Registers
Read
data1
Read
data2
Read
register 1
Read
register 2
16
Sign
extend
Write
register
Write
data
Read
data
1
ALU
result
M
u
x
ALU
Zero
ID/EX
Data
memory
Address
Memory
Example for lw instruction: Writeback (WB)
Instruction
memory
Address
4
32
0
Add
Add
result
Shift
left 2
Instruction
IF/ID EX/MEM MEM/WB
M
u
x
0
1
Add
PC
0
Write
data
M
u
x
1
Registers
Read
data1
Read
data2
Read
register 1
Read
register 2
16
Sign
extend
Write
register
Write
data
Read
data
1
ALU
result
M
u
x
ALU
Zero
ID/EX
Data
memory
Address
Writeback
Example for sw instruction: Memory (MEM)
Instruction
memory
Address
4
32
0
Add
Add
result
Shift
left 2
Instruction
IF/ID EX/MEM MEM/WB
M
u
x
0
1
Add
PC
0
Write
data
M
u
x
1
Registers
Read
data1
Read
data2
Read
register 1
Read
register 2
16
Sign
extend
Write
register
Write
data
Read
data
1
ALU
result
M
u
x
ALU
Zero
ID/EX
Data
memory
Address
Memory
Example for sw instruction: Writeback (WB): do nothing
Instruction
memory
Address
4
32
0
Add
Add
result
Shift
left 2
Instruction
IF/ID EX/MEM MEM/WB
M
u
x
0
1
Add
PC
0
Write
data
M
u
x
1
Registers
Read
data1
Read
data2
Read
register 1
Read
register 2
16
Sign
extend
Write
register
Write
data
Read
data
1
ALU
result
M
u
x
ALU
Zero
ID/EX
Data
memory
Address
Writeback
Corrected Datapath (for lw)
Instruction
memory
Address
4
32
0
Add Add
result
Shift
left 2
Instruction
IF/ID EX/MEM MEM/WB
M
u
x
0
1
Add
PC
0
Address
Write
data
M
u
x
1
Registers
Read
data1
Read
data2
Read
register 1
Read
register 2
16
Sign
extend
Write
register
Write
data
Read
data
Data
memory
1
ALU
result
M
u
x
ALU
Zero
ID/EX
Pipeline Control
PC
Instruction
memory
Address
Instruction
Instruction
[20– 16]
MemtoReg
ALUOp
Branch
RegDst
ALUSrc
4
16 32
Instruction
[15– 0]
0
0
Registers
Write
register
Write
data
Read
data 1
Read
data 2
Read
register 1
Read
register 2
Sign
extend
M
u
x
1
Write
data
Read
data M
u
x
1
ALU
control
RegWrite
MemRead
Instruction
[15– 11]
6
IF/ID ID/EX EX/MEM MEM/WB
MemWrite
Address
Data
memory
PCSrc
Zero
Add
Add
result
Shift
left 2
ALU
result
ALU
Zero
Add
0
1
M
u
x
0
1
M
u
x
• We have 5 stages. What needs to be controlled in each stage?
– Instruction Fetch and PC Increment
– Instruction Decode / Register Fetch
– Execution (4 lines)
• RegDst
• ALUop[1:0]
• ALUSrc
– Memory Stage (3 lines)
• Branch
• MemRead
• MemWrite
– Write Back (2 lines)
• MemtoReg
• RegWrite (note that this signal is in ID stage)
Pipeline control
• Extend pipeline registers to include control information (created
in ID)
• Pass control signals along just like the data
Pipeline Control
Execution/Address
Calculation stage control
lines
Memory access stage
control lines
Write-back
stage control
lines
Instruction
Reg
Dst
ALU
Op1
ALU
Op0
ALU
Src Branch
Mem
Read
Mem
Write
Reg
write
Mem
to Reg
R-format 1 1 0 0 0 0 0 1 0
lw 0 0 0 1 0 1 0 1 1
sw X 0 0 1 0 0 1 0 X
beq X 0 1 0 1 0 0 0 X
Control
EX
M
WB
M
WB
WB
IF/ID ID/EX EX/MEM MEM/WB
Instruction
Datapath with Control
PC
Instruction
memory
Instruction
Add
Instruction
[20– 16]
MemtoReg
ALUOp
Branch
RegDst
ALUSrc
4
16 32Instruction
[15– 0]
0
0
M
u
x
0
1
Add
Add
result
Registers
Write
register
Write
data
Read
data 1
Read
data 2
Read
register 1
Read
register 2
Sign
extend
M
u
x
1
ALU
result
Zero
Write
data
Read
data
M
u
x
1
ALU
control
Shift
left 2RegWrite
MemRead
Control
ALU
Instruction
[15– 11]
6
EX
M
WB
M
WB
WB
IF/ID
PCSrc
ID/EX
EX/MEM
MEM/WB
M
u
x
0
1
MemWrite
Address
Data
memory
Address
Datapath with Control
PC
Instruction
memory
Instruction
Add
Instruction
[20– 16]
MemtoReg
ALUOp
Branch
RegDst
ALUSrc
4
16 32Instruction
[15– 0]
0
0
M
u
x
0
1
Add
Add
result
Registers
Write
register
Write
data
Read
data 1
Read
data 2
Read
register 1
Read
register 2
Sign
extend
M
u
x
1
ALU
result
Zero
Write
data
Read
data
M
u
x
1
ALU
control
Shift
left 2RegWrite
MemRead
Control
ALU
Instruction
[15– 11]
6
EX
M
WB
M
WB
WB
IF/ID
PCSrc
ID/EX
EX/MEM
MEM/WB
M
u
x
0
1
MemWrite
Address
Data
memory
Address
IF: lw $10, 8($1)IF: lw $10, 8($1)
Datapath with Control
PC
Instruction
memory
Instruction
Add
Instruction
[20– 16]
MemtoReg
ALUOp
Branch
RegDst
ALUSrc
4
16 32Instruction
[15– 0]
0
0
M
u
x
0
1
Add
Add
result
Registers
Write
register
Write
data
Read
data 1
Read
data 2
Read
register 1
Read
register 2
Sign
extend
M
u
x
1
ALU
result
Zero
Write
data
Read
data
M
u
x
1
ALU
control
Shift
left 2RegWrite
MemRead
Control
ALU
Instruction
[15– 11]
6
X
M
WB
M
WB
WB
IF/ID
PCSrc
ID/EX
EX/MEM
MEM/WB
M
u
x
0
1
MemWrite
Address
Data
memory
Address
IF: sub $11, $2, $3IF: sub $11, $2, $3 ID: lw $10, 8($1)ID: lw $10, 8($1)
11
010
0001 E
“lw”
Datapath with Control
PC
Instruction
memory
Instruction
Add
Instruction
[20– 16]
MemtoReg
ALUOp
Branch
RegDst
ALUSrc
4
16 32Instruction
[15– 0]
0
0
M
u
x
0
1
Add
Add
result
Registers
Write
register
Write
data
Read
data 1
Read
data 2
Read
register 1
Read
register 2
Sign
extend
M
u
x
1
ALU
result
Zero
Write
data
Read
data
M
u
x
1
ALU
control
Shift
left 2RegWrite
MemRead
Control
ALU
Instruction
[15– 11]
6
X
M
WB
M
WB
WB
IF/ID
PCSrc
ID/EX
EX/MEM
MEM/WB
M
u
x
0
1
MemWrite
Address
Data
memory
Address
11
010
00E
ID: sub $11, $2, $3ID: sub $11, $2, $3 EX: lw $10, 8($1)EX: lw $10, 8($1)IF: and $12, $4, $5IF: and $12, $4, $5
1
0
10
000
1100
“sub”
Datapath with Control
PC
Instruction
memory
Instruction
Add
Instruction
[20– 16]
MemtoReg
ALUOp
Branch
RegDst
ALUSrc
4
16 32Instruction
[15– 0]
0
0
M
u
x
0
1
Add
Add
result
Registers
Write
register
Write
data
Read
data 1
Read
data 2
Read
register 1
Read
register 2
Sign
extend
M
u
x
1
ALU
result
Zero
Write
data
Read
data
M
u
x
1
ALU
control
Shift
left 2RegWrite
MemRead
Control
ALU
Instruction
[15– 11]
6
X
M
WB
M
WB
WB
IF/ID
PCSrc
ID/EX
EX/MEM
MEM/WB
M
u
x
0
1
MemWrite
Address
Data
memory
Address
10
000
10E
EX: sub $11, $2, $3EX: sub $11, $2, $3 MEM: lw $10, 8($1)MEM: lw $10, 8($1)ID: and $12, $4, $5ID: and $12, $4, $5
0
1
10
000
1100
IF: or $13, $6, $7IF: or $13, $6, $7
11
0
1
0
“and”
Datapath with Control
PC
Instruction
memory
Instruction
Add
Instruction
[20– 16]
MemtoReg
ALUOp
Branch
RegDst
ALUSrc
4
16 32Instruction
[15– 0]
0
0
M
u
x
0
1
Add
Add
result
Registers
Write
register
Write
data
Read
data 1
Read
data 2
Read
register 1
Read
register 2
Sign
extend
M
u
x
1
ALU
result
Zero
Write
data
Read
data
M
u
x
1
ALU
control
Shift
left 2RegWrite
MemRead
Control
ALU
Instruction
[15– 11]
6
X
M
WB
M
WB
WB
IF/ID
PCSrc
ID/EX
EX/MEM
MEM/WB
M
u
x
0
1
MemWrite
Address
Data
memory
Address
10
000
10E
MEM: sub $11, ..MEM: sub $11, .. WB: lw $10,WB: lw $10,
8($1)8($1)
EX: and $12, $4, $5EX: and $12, $4, $5
0
1
10
000
1100
ID: or $13, $6, $7ID: or $13, $6, $7
10
0
0
0
“or”
IF: add $14, $8, $9IF: add $14, $8, $9
1
1
Datapath with Control
PC
Instruction
memory
Instruction
Add
Instruction
[20– 16]
MemtoReg
ALUOp
Branch
RegDst
ALUSrc
4
16 32Instruction
[15– 0]
0
0
M
u
x
0
1
Add
Add
result
Registers
Write
register
Write
data
Read
data 1
Read
data 2
Read
register 1
Read
register 2
Sign
extend
M
u
x
1
ALU
result
Zero
Write
data
Read
data
M
u
x
1
ALU
control
Shift
left 2RegWrite
MemRead
Control
ALU
Instruction
[15– 11]
6
X
M
WB
M
WB
WB
IF/ID
PCSrc
ID/EX
EX/MEM
MEM/WB
M
u
x
0
1
MemWrite
Address
Data
memory
Address
10
000
10E
WB: sub $11, ..WB: sub $11, ..MEM: and $12…MEM: and $12…
0
1
10
000
1100
EX: or $13, $6, $7EX: or $13, $6, $7
10
0
0
0
“add”
ID: add $14, $8, $9ID: add $14, $8, $9
1
0
IF: xxxxIF: xxxx
Datapath with Control
PC
Instruction
memory
Instruction
Add
Instruction
[20– 16]
MemtoReg
ALUOp
Branch
RegDst
ALUSrc
4
16 32Instruction
[15– 0]
0
0
M
u
x
0
1
Add
Add
result
Registers
Write
register
Write
data
Read
data 1
Read
data 2
Read
register 1
Read
register 2
Sign
extend
M
u
x
1
ALU
result
Zero
Write
data
Read
data
M
u
x
1
ALU
control
Shift
left 2RegWrite
MemRead
Control
ALU
Instruction
[15– 11]
6
M
WB
WB
IF/ID
PCSrc
EX/MEM
MEM/WB
M
u
x
0
1
MemWrite
Address
Data
memory
Address
10
000
10
WB: and $12…WB: and $12…
0
1
MEM: or $13, ..MEM: or $13, ..
10
0
0
0
EX: add $14, $8, $9EX: add $14, $8, $9
1
0
IF: xxxxIF: xxxx ID: xxxxID: xxxx
X
M
WB
ID/EX
E
Datapath with Control
WB: or $13…WB: or $13…
PC
Instruction
memory
Instruction
Add
Instruction
[20– 16]
MemtoReg
ALUOp
Branch
RegDst
ALUSrc
4
16 32Instruction
[15– 0]
0
0
M
u
x
0
1
Add
Add
result
Registers
Write
register
Write
data
Read
data 1
Read
data 2
Read
register 1
Read
register 2
Sign
extend
M
u
x
1
ALU
result
Zero
Write
data
Read
data
M
u
x
1
ALU
control
Shift
left 2RegWrite
MemRead
Control
ALU
Instruction
[15– 11]
6
M
WB
WB
IF/ID
PCSrc
EX/MEM
MEM/WB
M
u
x
0
1
MemWrite
Address
Data
memory
Address
MEM: add $14, ..MEM: add $14, ..
10
0
0
0
EX: xxxxEX: xxxx
1
0
IF: xxxxIF: xxxx ID: xxxxID: xxxx
X
M
WB
ID/EX
E
Datapath with Control
PC
Instruction
memory
Instruction
Add
Instruction
[20– 16]
MemtoReg
ALUOp
Branch
RegDst
ALUSrc
4
16 32Instruction
[15– 0]
0
0
M
u
x
0
1
Add
Add
result
Registers
Write
register
Write
data
Read
data 1
Read
data 2
Read
register 1
Read
register 2
Sign
extend
M
u
x
1
ALU
result
Zero
Write
data
Read
data
M
u
x
1
ALU
control
Shift
left 2RegWrite
MemRead
Control
ALU
Instruction
[15– 11]
6
M
WB
WB
IF/ID
PCSrc
EX/MEM
MEM/WB
M
u
x
0
1
MemWrite
Address
Data
memory
Address
WB: add $14..WB: add $14..MEM: xxxxMEM: xxxxEX: xxxxEX: xxxx
1
0
IF: xxxxIF: xxxx ID: xxxxID: xxxx
X
M
WB
ID/EX
E
Pipelining is not quite that straightforward !
• Limits to pipelining: Hazards prevent next
instruction from executing during its
designated clock cycle
– Structural hazards: HW cannot support this combination
of instructions
– Data hazards: Instruction depends on result of prior
instruction still in the pipeline
– Control hazards: Caused by delay between the fetching of
instructions and decisions about changes in control flow
(branches and jumps).
“Single” Memory Port
Time (clock cycles)
Load
Instr 1
Instr 2
Instr 3
Instr 4
Cycle 1 Cycle 2 Cycle 3 Cycle 4 Cycle 6 Cycle 7Cycle 5
Reg
ALU
RegIfetch DMem
Reg
ALU
RegIfetch DMem
Reg
ALU
RegIfetch DMem
Reg
ALU
RegIfetch DMem
Reg
ALU
RegIfetch DMem
Instructionorder
“Single” Memory Port / Structural Hazard
Time (clock cycles)
Load
add
Instr 2
Instr 3
Instr 4
Cycle 1 Cycle 2 Cycle 3 Cycle 4 Cycle 6 Cycle 7Cycle 5
Reg
ALU
RegIfetch DMem
Reg
ALU
RegIfetch DMem
Reg
ALU
RegIfetch DMem
Reg
ALU
RegIfetch DMem
Reg
ALU
RegIfetch DMem
Instructionorder
Data Hazard
add r1,r2,r3
sub r4,r1,r3
and r6,r1,r7
or r8,r1,r9
xor r10,r1,r11
Reg
ALU
DMemIfetch Reg
Reg
ALU
DMemIfetch Reg
Reg
ALU
DMemIfetch Reg
Reg
ALU
DMemIfetch Reg
Reg
ALU
DMemIfetch Reg
Time (clock cycles)
Instructionorder
Forwarding to Avoid Data Hazard
Time (clock cycles)
add r1,r2,r3
sub r4,r1,r3
and r6,r1,r7
or r8,r1,r9
xor r10,r1,r11
Reg
ALU
DMemIfetch Reg
Reg
ALU
DMemIfetch Reg
Reg
ALU
DMemIfetch Reg
Reg
ALU
DMemIfetch Reg
Reg
ALU
DMemIfetch Reg
Instructionorder
Forwarding (simplified)
Data
Memory
Register
File
MUX
ID/EX EX/MEM MEM/WB
ALU
Forwarding (from EX/MEM)
ALU
Data
Memory
Register
File
MUX
ID/EX EX/MEM MEM/WB
MUXMUX
Forwarding (from MEM/WB)
ALU
Data
Memory
Register
File
MUX
ID/EX EX/MEM MEM/WB
MUXMUX
Forwarding (operand selection)
ALU
Data
Memory
Register
File
MUX
ID/EX EX/MEM MEM/WB
MUXMUX
Forwarding
Unit
Forwarding (operand propagation)
ALU
Data
Memory
Register
File
MUX
ID/EX EX/MEM MEM/WB
MUXMUX
Forwarding
Unit
Rt
Rs
MUX
Rd
Rt
EX/MEM Rd
MEM/WB Rd
Data Hazard Even with Forwarding
Time (clock cycles)
lw r1, 0(r2)
sub r4,r1,r6
and r6,r1,r7
or r8,r1,r9
Reg
ALU
DMemIfetch Reg
Reg
ALU
DMemIfetch Reg
Reg
ALU
DMemIfetch Reg
Reg
ALU
DMemIfetch Reg
Forward backward in time… no way!! (or way?)
Instructionorder
Data Hazard Even with Forwarding
Time (clock cycles)
or r8,r1,r9
lw r1, 0(r2)
sub r4,r1,r6
and r6,r1,r7
Reg
ALU
DMemIfetch Reg
RegIfetch
ALU
DMem RegBubble
Ifetch
ALU
DMem RegBubble Reg
Ifetch
ALU
DMemBubble Reg
Need “pipeline interlock” (or stall) to stop instructions
from issuing. How is this detected?
NO
ISSUE
NO
ISSUE
Instructionorder
Hazard Detection Unit
• Stall by letting an instruction that won’t write anything go forward
• Stall the pipeline if ID/EX is a load, and (rt=IF/ID.rs or rt=IF/ID.rt)
PC
Instruction
memory
Registers
M
u
x
M
u
x
M
u
x
Control
ALU
EX
M
WB
M
WB
WB
ID/EX
EX/MEM
MEM/WB
Data
memory
M
u
x
Hazard
detection
unit
Forwarding
unit
0
M
u
x
IF/ID
Instruction
ID/EX.MemRead
IF/IDWrite
PCWrite
ID/EX.RegisterRt
IF/ID.RegisterRd
IF/ID.RegisterRt
IF/ID.RegisterRt
IF/ID.RegisterRs
Rt
Rs
Rd
Rt
EX/MEM.RegisterRd
MEM/WB.RegisterRd
Code Rescheduling to Avoid Load Hazards
Try producing fast code for
a = b + c;
d = e – f;
assuming a, b, c, d ,e, and f in memory.
Slow code:
LW Rb,b
LW Rc,c
ADD Ra,Rb,Rc
SW a,Ra
LW Re,e
LW Rf,f
SUB Rd,Re,Rf
SW d,Rd
Compiler optimizes for performance. Hardware checks for safety.
Fast code:
LW Rb,b
LW Rc,c
LW Re,e
ADD Ra,Rb,Rc
LW Rf,f
SW a,Ra
SUB Rd,Re,Rf
SW d,Rd
Control Hazard due to Branches (3 stall cycles)
10: beq r1,r3,36
14: and r2,r3,r5
18: or r6,r1,r7
22: add r8,r1,r9
36: xor r10,r1,r11
Reg
ALU
DMemIfetch Reg
Reg
ALU
DMemIfetch Reg
Reg
ALU
DMemIfetch Reg
Reg
ALU
DMemIfetch Reg
Reg
ALU
DMemIfetch
What do you do with the 3 instructions in between?
How do you do it?
Where is the “commit”?
Branch Hazard Resolutions
#1 Stall until branch direction is clear ()
#2: Static Branch Prediction
• Predict Not Taken (Fall through, as shown in previous slide)
– Execute successor instructions in sequence
– “Squash” instructions in pipeline if branch actually taken
– PC+4 already calculated, so use it to get next instruction
• Predict Branch Taken
– But haven’t calculated branch target address
• Might incur 1 cycle branch penalty
• Other machines: branch target known before outcome
#3 Dynamic Branch Prediction
– Will dedicate a lecture to such techniques
Alternative Branch Hazard Resolutions
#4 Delayed Branch
– Define branch to take place AFTER a following
instruction
branch instruction
sequential successor1
sequential successor2
........
sequential successorn
branch target if taken
– 1 slot delay allows proper decision and branch
target address in 5 stage pipeline (next page)
Branch delay of length n
Filling Branch Delay Slot
Make sure R7 will
not be used in taken
path before
redefined
Other Pipelining Issues
• To have all instructions finish within one cycle
– Slow down frequency to cope w/ the critical
operation, or
– Allow non-uniform latency operation
Support Multiple FP Operations
• Complicate bypass
• Potential structural hazard
• Multiple (FP) instructions can complete at the same time
– RF might need to be multi-ported
– Ordering issue, who gets to update the register?
IFIF IDID MEMMEM WBWB
E
X
E
X
M
2
M
2
M
3
M
3
M
4
M
4
M
1
M
1
M
5
M
5
M
6
M
6
M
7
M
7
A
2
A
2
A
3
A
3
A
4
A
4
A
1
A
1
Integer Unit
FP multiplier
FP add
FP divider (non-pipelined)
Full Bypass/Forwarding Needed
IF ID EX M WB
IF ID SS M WBM1 M2 M3 M4 M5 M6 M7
L.D F4,0(R2)
MUL.D F0,F4,F6
A4IF SS ID SS SS SS SS SS SS A1 A2 A3 M WB
IF SS SS SS SS SS SS ID EX SS SS SS M WB
ADD.D F2,F0,F8
S.D F2,0(R2)
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18Clock Cycles
Structural Hazards
• Write to register file at the same cycle (cc11)
• Write to the same register (WAW)
• MEM in cc10
1 2 3 4 5 6 7 8 9 10 11Clock Cycles
IF ID M WBM1 M2 M3 M4 M5 M6 M7MUL.D F0,F4,F6
A4IF ID A1 A2 A3 M WBADD.D F2,F4,F6
IF ID EX M WBL.D F2,0(R2)
IF ID EX M WB. . . .
IF ID EX M WB. . . .
IF ID EX M WB. . . .
IF ID EX M WB. . . .
Precise Exception Issue
• Precise exception: If the pipeline can (or must) be stopped
– All the instructions before the faulty (or intended)
instruction must be completed
– All the instructions after it must not be completed
– Restart the execution from the faulty (or intended)
instruction
• State must be consistent with the original program order
• Not straightforward with out-of-order completion
• Simple solution: Stalling until no exception of prior long-
latency instruction is guaranteed
• Other modern solution: ROB (will dedicate a lecture to it)
DIV.D F0,F2,F4
(exception!)
ADD.D F3,F10,F8 (completed)
SUB.D F12,F12,F14 (completed)
MIPS R4000 Pipeline
• Deeper Pipeline (superpipelining)
• 2 cycle delays for load
• Predicted-Not-Taken strategy
– Not-taken (fall-through) branch : 1 delay slot
– Taken branch: 1 delay slot + 2 idle cycles
IF IS RF EX DF DS TC WB
Instruction
Memory
Reg Data Memory Reg
ALU
Branch target and condition eval.
Load delay (2 cycles)
IF IS RF EX DF DS TC WB
Instruction
Memory
Reg Data Memory Reg
ALU
Instruction
Memory
Reg Data Memory Reg
ALU
Instruction
Memory
Reg Data Memory Reg
ALUInstruction
Memory
Reg Data Memory Reg
ALU
CC1 CC2 CC3 CC4 CC5 CC6 CC7 CC8 CC9 CC10 CC11
LD R1
ADD R2, R1
Inst 2
Inst 1
If no delay slot instructions scheduled, R4000 will perform HW interlock
Bubble
Bubble
Branches (Predicted-not-taken)
IF IS RF EX DF DS TC WB
CC1 CC2 CC3 CC4 CC5 CC6 CC7 CC8
IF IS RF EX DF DS TC WB
CC9 CC10 CC11
Branch
Delay slot
IF IS RF EX DF DS TC WB
IF IS RF EX DF DS TC WB
Branch inst+2
Branch inst+3
N
O
T
T
A
K
E
N
SSStall
Stall
SS SS SS SS SS SS SS
SS SS SS SS SS SS SS SS
IF IS RF EX DF DS TCBranch Target
IF IS RF EX DF DS TC WB
IF IS RF EX DF DS TC WB
Branch
Delay slotT
A
K
E
N
A
C
T
U
A
L
D
I
R
E
C
T
I
O
N

More Related Content

PPT
Lec2 Computer Architecture by Hsien-Hsin Sean Lee Georgia Tech -- ILP
Hsien-Hsin Sean Lee, Ph.D.
 
PPT
Lec19 Intro to Computer Engineering by Hsien-Hsin Sean Lee Georgia Tech -- Pr...
Hsien-Hsin Sean Lee, Ph.D.
 
PPT
Lec18 Intro to Computer Engineering by Hsien-Hsin Sean Lee Georgia Tech -- In...
Hsien-Hsin Sean Lee, Ph.D.
 
PPT
Lec20 Intro to Computer Engineering by Hsien-Hsin Sean Lee Georgia Tech -- Da...
Hsien-Hsin Sean Lee, Ph.D.
 
PPT
Lec12 Computer Architecture by Hsien-Hsin Sean Lee Georgia Tech -- P6, Netbur...
Hsien-Hsin Sean Lee, Ph.D.
 
PPT
Lec8 Computer Architecture by Hsien-Hsin Sean Lee Georgia Tech -- Dynamic Sch...
Hsien-Hsin Sean Lee, Ph.D.
 
PPT
Lec6 Computer Architecture by Hsien-Hsin Sean Lee Georgia Tech -- Instruction...
Hsien-Hsin Sean Lee, Ph.D.
 
PPT
Lec7 Computer Architecture by Hsien-Hsin Sean Lee Georgia Tech -- Dynamic Sch...
Hsien-Hsin Sean Lee, Ph.D.
 
Lec2 Computer Architecture by Hsien-Hsin Sean Lee Georgia Tech -- ILP
Hsien-Hsin Sean Lee, Ph.D.
 
Lec19 Intro to Computer Engineering by Hsien-Hsin Sean Lee Georgia Tech -- Pr...
Hsien-Hsin Sean Lee, Ph.D.
 
Lec18 Intro to Computer Engineering by Hsien-Hsin Sean Lee Georgia Tech -- In...
Hsien-Hsin Sean Lee, Ph.D.
 
Lec20 Intro to Computer Engineering by Hsien-Hsin Sean Lee Georgia Tech -- Da...
Hsien-Hsin Sean Lee, Ph.D.
 
Lec12 Computer Architecture by Hsien-Hsin Sean Lee Georgia Tech -- P6, Netbur...
Hsien-Hsin Sean Lee, Ph.D.
 
Lec8 Computer Architecture by Hsien-Hsin Sean Lee Georgia Tech -- Dynamic Sch...
Hsien-Hsin Sean Lee, Ph.D.
 
Lec6 Computer Architecture by Hsien-Hsin Sean Lee Georgia Tech -- Instruction...
Hsien-Hsin Sean Lee, Ph.D.
 
Lec7 Computer Architecture by Hsien-Hsin Sean Lee Georgia Tech -- Dynamic Sch...
Hsien-Hsin Sean Lee, Ph.D.
 

What's hot (20)

PPT
Lec5 Computer Architecture by Hsien-Hsin Sean Lee Georgia Tech -- Branch Pred...
Hsien-Hsin Sean Lee, Ph.D.
 
PPT
Lec15 Intro to Computer Engineering by Hsien-Hsin Sean Lee Georgia Tech -- Re...
Hsien-Hsin Sean Lee, Ph.D.
 
PPT
Lec15 Computer Architecture by Hsien-Hsin Sean Lee Georgia Tech -- EPIC VLIW
Hsien-Hsin Sean Lee, Ph.D.
 
PPT
Lec17 Intro to Computer Engineering by Hsien-Hsin Sean Lee Georgia Tech -- Me...
Hsien-Hsin Sean Lee, Ph.D.
 
PPT
Data hazards ppt
MBalaji9
 
PPT
Introduction to Assembly Language
Motaz Saad
 
PPTX
8086 microprocessor instruction set by Er. Swapnil Kaware
Prof. Swapnil V. Kaware
 
PDF
Understanding Tomasulo Algorithm
onesuper
 
PPT
The 8051 assembly language
hemant meena
 
PPT
8086-instruction-set-ppt
jemimajerome
 
PPTX
Lec05
siddu kadiwal
 
PPT
Stack and subroutine
Ashim Saha
 
PPTX
Lec02
siddu kadiwal
 
PPTX
Stacks & subroutines 1
deval patel
 
PPTX
17
dano2osu
 
PDF
Chapter 6 - Introduction to 8085 Instructions
cmkandemir
 
PDF
8086 labmanual
iravi9
 
PDF
Chapter 7 - Programming Techniques with Additional Instructions
cmkandemir
 
PPT
8085 micro processor
Poojith Chowdhary
 
Lec5 Computer Architecture by Hsien-Hsin Sean Lee Georgia Tech -- Branch Pred...
Hsien-Hsin Sean Lee, Ph.D.
 
Lec15 Intro to Computer Engineering by Hsien-Hsin Sean Lee Georgia Tech -- Re...
Hsien-Hsin Sean Lee, Ph.D.
 
Lec15 Computer Architecture by Hsien-Hsin Sean Lee Georgia Tech -- EPIC VLIW
Hsien-Hsin Sean Lee, Ph.D.
 
Lec17 Intro to Computer Engineering by Hsien-Hsin Sean Lee Georgia Tech -- Me...
Hsien-Hsin Sean Lee, Ph.D.
 
Data hazards ppt
MBalaji9
 
Introduction to Assembly Language
Motaz Saad
 
8086 microprocessor instruction set by Er. Swapnil Kaware
Prof. Swapnil V. Kaware
 
Understanding Tomasulo Algorithm
onesuper
 
The 8051 assembly language
hemant meena
 
8086-instruction-set-ppt
jemimajerome
 
Stack and subroutine
Ashim Saha
 
Stacks & subroutines 1
deval patel
 
Chapter 6 - Introduction to 8085 Instructions
cmkandemir
 
8086 labmanual
iravi9
 
Chapter 7 - Programming Techniques with Additional Instructions
cmkandemir
 
8085 micro processor
Poojith Chowdhary
 
Ad

Similar to Lec1 Computer Architecture by Hsien-Hsin Sean Lee Georgia Tech -- Pipelining (20)

PPTX
PROCESSOR AND CONTROL UNIT
Amirthavalli Senthil
 
PPTX
PROCESSOR AND CONTROL UNIT - unit 3 Architecture
Gunasundari Selvaraj
 
PPT
Pipeline data path u3
Karthik Vivek
 
PPT
Chapter 4
ececourse
 
PDF
multi cycle in microprocessor 8086 sy B-tech
RushikeshThorat24
 
PDF
CAAL_CCSU_U1.pdf
salabhmehrotra
 
PPTX
Computer Architecture - Data Path & Pipeline Hazards
Thyagharajan K.K.
 
PPT
Control unit-implementation
WBUTTUTORIALS
 
PPT
Introduction to intel 8086 part1
Shehrevar Davierwala
 
PPT
Introduction
Shehrevar Davierwala
 
PPT
CSOA unit 5 part 1 Cbshsjjsjjs jsnshhsjw
himanshukandari35
 
PPT
Central Processing Unit_Computer Organization.ppt
Ramanamurthy Banda
 
PPTX
Instruction Set Architecture
Dilum Bandara
 
PPT
other-architectures.ppt
Jaya Chavan
 
PPTX
Basic computer organization design
ndasharath
 
PDF
Instruction execution cycle _
SwatiHans10
 
PPT
Ch8_CENTRAL PROCESSING UNIT Registers ALU
RNShukla7
 
PDF
Unit II Arm 7 Introduction
Dr. Pankaj Zope
 
PROCESSOR AND CONTROL UNIT
Amirthavalli Senthil
 
PROCESSOR AND CONTROL UNIT - unit 3 Architecture
Gunasundari Selvaraj
 
Pipeline data path u3
Karthik Vivek
 
Chapter 4
ececourse
 
multi cycle in microprocessor 8086 sy B-tech
RushikeshThorat24
 
CAAL_CCSU_U1.pdf
salabhmehrotra
 
Computer Architecture - Data Path & Pipeline Hazards
Thyagharajan K.K.
 
Control unit-implementation
WBUTTUTORIALS
 
Introduction to intel 8086 part1
Shehrevar Davierwala
 
Introduction
Shehrevar Davierwala
 
CSOA unit 5 part 1 Cbshsjjsjjs jsnshhsjw
himanshukandari35
 
Central Processing Unit_Computer Organization.ppt
Ramanamurthy Banda
 
Instruction Set Architecture
Dilum Bandara
 
other-architectures.ppt
Jaya Chavan
 
Basic computer organization design
ndasharath
 
Instruction execution cycle _
SwatiHans10
 
Ch8_CENTRAL PROCESSING UNIT Registers ALU
RNShukla7
 
Unit II Arm 7 Introduction
Dr. Pankaj Zope
 
Ad

More from Hsien-Hsin Sean Lee, Ph.D. (20)

PPT
Lec16 Intro to Computer Engineering by Hsien-Hsin Sean Lee Georgia Tech -- Fi...
Hsien-Hsin Sean Lee, Ph.D.
 
PPT
Lec14 Intro to Computer Engineering by Hsien-Hsin Sean Lee Georgia Tech -- Se...
Hsien-Hsin Sean Lee, Ph.D.
 
PPT
Lec13 Intro to Computer Engineering by Hsien-Hsin Sean Lee Georgia Tech -- Sh...
Hsien-Hsin Sean Lee, Ph.D.
 
PPT
Lec12 Intro to Computer Engineering by Hsien-Hsin Sean Lee Georgia Tech -- Ad...
Hsien-Hsin Sean Lee, Ph.D.
 
PPT
Lec11 Intro to Computer Engineering by Hsien-Hsin Sean Lee Georgia Tech -- De...
Hsien-Hsin Sean Lee, Ph.D.
 
PPT
Lec10 Intro to Computer Engineering by Hsien-Hsin Sean Lee Georgia Tech -- Mu...
Hsien-Hsin Sean Lee, Ph.D.
 
PPT
Lec9 Intro to Computer Engineering by Hsien-Hsin Sean Lee Georgia Tech -- Com...
Hsien-Hsin Sean Lee, Ph.D.
 
PPT
Lec8 Intro to Computer Engineering by Hsien-Hsin Sean Lee Georgia Tech -- Qui...
Hsien-Hsin Sean Lee, Ph.D.
 
PPT
Lec7 Intro to Computer Engineering by Hsien-Hsin Sean Lee Georgia Tech -- Kar...
Hsien-Hsin Sean Lee, Ph.D.
 
PPT
Lec6 Intro to Computer Engineering by Hsien-Hsin Sean Lee Georgia Tech -- Can...
Hsien-Hsin Sean Lee, Ph.D.
 
PPT
Lec5 Intro to Computer Engineering by Hsien-Hsin Sean Lee Georgia Tech -- Boo...
Hsien-Hsin Sean Lee, Ph.D.
 
PPT
Lec4 Intro to Computer Engineering by Hsien-Hsin Sean Lee Georgia Tech -- CMOS
Hsien-Hsin Sean Lee, Ph.D.
 
PPT
Lec3 Intro to Computer Engineering by Hsien-Hsin Sean Lee Georgia Tech -- CMO...
Hsien-Hsin Sean Lee, Ph.D.
 
PPT
Lec2 Intro to Computer Engineering by Hsien-Hsin Sean Lee Georgia Tech -- Num...
Hsien-Hsin Sean Lee, Ph.D.
 
PPT
Lec1 Intro to Computer Engineering by Hsien-Hsin Sean Lee Georgia Tech -- Intro
Hsien-Hsin Sean Lee, Ph.D.
 
PPT
Lec14 Computer Architecture by Hsien-Hsin Sean Lee Georgia Tech --- Coherence
Hsien-Hsin Sean Lee, Ph.D.
 
PPT
Lec13 Computer Architecture by Hsien-Hsin Sean Lee Georgia Tech -- SMP
Hsien-Hsin Sean Lee, Ph.D.
 
PPT
Lec13 Computer Architecture by Hsien-Hsin Sean Lee Georgia Tech -- Multicore
Hsien-Hsin Sean Lee, Ph.D.
 
PPT
Lec11 Computer Architecture by Hsien-Hsin Sean Lee Georgia Tech -- Memory part3
Hsien-Hsin Sean Lee, Ph.D.
 
PPT
Lec10 Computer Architecture by Hsien-Hsin Sean Lee Georgia Tech -- Memory part2
Hsien-Hsin Sean Lee, Ph.D.
 
Lec16 Intro to Computer Engineering by Hsien-Hsin Sean Lee Georgia Tech -- Fi...
Hsien-Hsin Sean Lee, Ph.D.
 
Lec14 Intro to Computer Engineering by Hsien-Hsin Sean Lee Georgia Tech -- Se...
Hsien-Hsin Sean Lee, Ph.D.
 
Lec13 Intro to Computer Engineering by Hsien-Hsin Sean Lee Georgia Tech -- Sh...
Hsien-Hsin Sean Lee, Ph.D.
 
Lec12 Intro to Computer Engineering by Hsien-Hsin Sean Lee Georgia Tech -- Ad...
Hsien-Hsin Sean Lee, Ph.D.
 
Lec11 Intro to Computer Engineering by Hsien-Hsin Sean Lee Georgia Tech -- De...
Hsien-Hsin Sean Lee, Ph.D.
 
Lec10 Intro to Computer Engineering by Hsien-Hsin Sean Lee Georgia Tech -- Mu...
Hsien-Hsin Sean Lee, Ph.D.
 
Lec9 Intro to Computer Engineering by Hsien-Hsin Sean Lee Georgia Tech -- Com...
Hsien-Hsin Sean Lee, Ph.D.
 
Lec8 Intro to Computer Engineering by Hsien-Hsin Sean Lee Georgia Tech -- Qui...
Hsien-Hsin Sean Lee, Ph.D.
 
Lec7 Intro to Computer Engineering by Hsien-Hsin Sean Lee Georgia Tech -- Kar...
Hsien-Hsin Sean Lee, Ph.D.
 
Lec6 Intro to Computer Engineering by Hsien-Hsin Sean Lee Georgia Tech -- Can...
Hsien-Hsin Sean Lee, Ph.D.
 
Lec5 Intro to Computer Engineering by Hsien-Hsin Sean Lee Georgia Tech -- Boo...
Hsien-Hsin Sean Lee, Ph.D.
 
Lec4 Intro to Computer Engineering by Hsien-Hsin Sean Lee Georgia Tech -- CMOS
Hsien-Hsin Sean Lee, Ph.D.
 
Lec3 Intro to Computer Engineering by Hsien-Hsin Sean Lee Georgia Tech -- CMO...
Hsien-Hsin Sean Lee, Ph.D.
 
Lec2 Intro to Computer Engineering by Hsien-Hsin Sean Lee Georgia Tech -- Num...
Hsien-Hsin Sean Lee, Ph.D.
 
Lec1 Intro to Computer Engineering by Hsien-Hsin Sean Lee Georgia Tech -- Intro
Hsien-Hsin Sean Lee, Ph.D.
 
Lec14 Computer Architecture by Hsien-Hsin Sean Lee Georgia Tech --- Coherence
Hsien-Hsin Sean Lee, Ph.D.
 
Lec13 Computer Architecture by Hsien-Hsin Sean Lee Georgia Tech -- SMP
Hsien-Hsin Sean Lee, Ph.D.
 
Lec13 Computer Architecture by Hsien-Hsin Sean Lee Georgia Tech -- Multicore
Hsien-Hsin Sean Lee, Ph.D.
 
Lec11 Computer Architecture by Hsien-Hsin Sean Lee Georgia Tech -- Memory part3
Hsien-Hsin Sean Lee, Ph.D.
 
Lec10 Computer Architecture by Hsien-Hsin Sean Lee Georgia Tech -- Memory part2
Hsien-Hsin Sean Lee, Ph.D.
 

Recently uploaded (20)

PPTX
cocomo-220726173706-141e08f0.tyuiuuupptx
DharaniMani4
 
PPTX
basic_parts-of_computer-1618-754-622.pptx
patelravi16187
 
PPTX
Modern machinery.pptx sjsjnshhsnsnnjnnbbbb
raipureastha08
 
PDF
Portable Veterinary Ultrasound Scanners & Animal Medical Equipment - TcCryo
3447752272
 
PPTX
原版UMiami毕业证文凭迈阿密大学学费单定制学历在线制作硕士毕业证
jicaaeb0
 
PPTX
Intro_S4HANA_Using_Global_Bike_Slides_SD_en_v4.1.pptx
trishalasharma7
 
PPTX
西班牙海牙认证瓦伦西亚国际大学毕业证与成绩单文凭复刻快速办理毕业证书
sw6vvn9s
 
PPTX
Basics of Memristors and fundamentals.pptx
onterusmail
 
PPTX
Boolean Algebra-Properties and Theorems.pptx
bhavanavarri5458
 
PPT
3 01032017tyuiryhjrhyureyhjkfdhghfrugjhf
DharaniMani4
 
PPTX
Basics of Memristors from zero to hero.pptx
onterusmail
 
PPTX
PPT on the topic of programming language
dishasindhava
 
PPTX
Operating-Systems-A-Journey ( by information
parthbhanushali307
 
PPTX
办理HFM文凭|购买代特莫尔德音乐学院毕业证文凭100%复刻安全可靠的
1cz3lou8
 
PPT
community diagnosis slides show health. ppt
michaelbrucebwana
 
PPTX
13. ANAESTHETICS AND ALCOHOLS.pptx fucking
sriramraja650
 
PPTX
PHISHING ATTACKS. _. _.pptx[]
kumarrana7525
 
PPTX
22. PSYCHOTOGENIC DRUGS.pptx 60d7co Gurinder
sriramraja650
 
PPTX
G6Q1 WEEK 2 SCIENCE PPT.pptxLVLLLLLLLLLLLLLLLLL
DitaSIdnay
 
PPTX
Query and optimizing operating system.pptx
YoomifTube
 
cocomo-220726173706-141e08f0.tyuiuuupptx
DharaniMani4
 
basic_parts-of_computer-1618-754-622.pptx
patelravi16187
 
Modern machinery.pptx sjsjnshhsnsnnjnnbbbb
raipureastha08
 
Portable Veterinary Ultrasound Scanners & Animal Medical Equipment - TcCryo
3447752272
 
原版UMiami毕业证文凭迈阿密大学学费单定制学历在线制作硕士毕业证
jicaaeb0
 
Intro_S4HANA_Using_Global_Bike_Slides_SD_en_v4.1.pptx
trishalasharma7
 
西班牙海牙认证瓦伦西亚国际大学毕业证与成绩单文凭复刻快速办理毕业证书
sw6vvn9s
 
Basics of Memristors and fundamentals.pptx
onterusmail
 
Boolean Algebra-Properties and Theorems.pptx
bhavanavarri5458
 
3 01032017tyuiryhjrhyureyhjkfdhghfrugjhf
DharaniMani4
 
Basics of Memristors from zero to hero.pptx
onterusmail
 
PPT on the topic of programming language
dishasindhava
 
Operating-Systems-A-Journey ( by information
parthbhanushali307
 
办理HFM文凭|购买代特莫尔德音乐学院毕业证文凭100%复刻安全可靠的
1cz3lou8
 
community diagnosis slides show health. ppt
michaelbrucebwana
 
13. ANAESTHETICS AND ALCOHOLS.pptx fucking
sriramraja650
 
PHISHING ATTACKS. _. _.pptx[]
kumarrana7525
 
22. PSYCHOTOGENIC DRUGS.pptx 60d7co Gurinder
sriramraja650
 
G6Q1 WEEK 2 SCIENCE PPT.pptxLVLLLLLLLLLLLLLLLLL
DitaSIdnay
 
Query and optimizing operating system.pptx
YoomifTube
 

Lec1 Computer Architecture by Hsien-Hsin Sean Lee Georgia Tech -- Pipelining

  • 1. ECE 4100/6100 Advanced Computer Architecture Lecture 1 Pipelining (3055 Review) Prof. Hsien-Hsin Sean Lee School of Electrical and Computer Engineering Georgia Institute of Technology
  • 2. Pipeline Stage Combinational Logic F/F F/F • Optimal FO4 per pipe – 6 to 8 [UT/Compaq, ISCA-29] – 18 (15+3 latch) [IBM, MICRO-35] P4 pipe stage~ 16 FO4 1 FO4
  • 3. Five-stage Pipelined Datapath Instruction memory Address 4 32 0 Add Add result Shift left 2 Instruction IF/ID EX/MEM MEM/WB M u x 0 1 Add PC 0 Write data M u x 1 Registers Read data 1 Read data 2 Read register 1 Read register 2 16 Sign extend Write register Write data Read data 1 ALU result M u x ALU Zero ID/EX Data memory Address Inst. Fetch Inst. Decode Exec Mem WB
  • 4. Example for lw instruction: Instruction Fetch (IF) Instruction memory Address 4 32 0 Add Add result Shift left 2 Instruction IF/ID EX/MEM MEM/WB M u x 0 1 Add PC 0 Write data M u x 1 Registers Read data1 Read data2 Read register 1 Read register 2 16 Sign extend Write register Write data Read data 1 ALU result M u x ALU Zero ID/EX Data memory Address Instruction fetch
  • 5. Example for lw instruction: Instruction Decode (ID) Instruction memory Address 4 32 0 Add Add result Shift left 2 Instruction IF/ID EX/MEM MEM/WB M u x 0 1 Add PC 0 Write data M u x 1 Registers Read data1 Read data2 Read register 1 Read register 2 16 Sign extend Write register Write data Read data 1 ALU result M u x ALU Zero ID/EX Data memory Address Instruction decode
  • 6. Example for lw instruction: Execution (EX) Instruction memory Address 4 32 0 Add Add result Shift left 2 Instruction IF/ID EX/MEM MEM/WB M u x 0 1 Add PC 0 Write data M u x 1 Registers Read data1 Read data2 Read register 1 Read register 2 16 Sign extend Write register Write data Read data 1 ALU result M u x ALU Zero ID/EX Data memory Address Execution
  • 7. Example for lw instruction: Memory (MEM) Instruction memory Address 4 32 0 Add Add result Shift left 2 Instruction IF/ID EX/MEM MEM/WB M u x 0 1 Add PC 0 Write data M u x 1 Registers Read data1 Read data2 Read register 1 Read register 2 16 Sign extend Write register Write data Read data 1 ALU result M u x ALU Zero ID/EX Data memory Address Memory
  • 8. Example for lw instruction: Writeback (WB) Instruction memory Address 4 32 0 Add Add result Shift left 2 Instruction IF/ID EX/MEM MEM/WB M u x 0 1 Add PC 0 Write data M u x 1 Registers Read data1 Read data2 Read register 1 Read register 2 16 Sign extend Write register Write data Read data 1 ALU result M u x ALU Zero ID/EX Data memory Address Writeback
  • 9. Example for sw instruction: Memory (MEM) Instruction memory Address 4 32 0 Add Add result Shift left 2 Instruction IF/ID EX/MEM MEM/WB M u x 0 1 Add PC 0 Write data M u x 1 Registers Read data1 Read data2 Read register 1 Read register 2 16 Sign extend Write register Write data Read data 1 ALU result M u x ALU Zero ID/EX Data memory Address Memory
  • 10. Example for sw instruction: Writeback (WB): do nothing Instruction memory Address 4 32 0 Add Add result Shift left 2 Instruction IF/ID EX/MEM MEM/WB M u x 0 1 Add PC 0 Write data M u x 1 Registers Read data1 Read data2 Read register 1 Read register 2 16 Sign extend Write register Write data Read data 1 ALU result M u x ALU Zero ID/EX Data memory Address Writeback
  • 11. Corrected Datapath (for lw) Instruction memory Address 4 32 0 Add Add result Shift left 2 Instruction IF/ID EX/MEM MEM/WB M u x 0 1 Add PC 0 Address Write data M u x 1 Registers Read data1 Read data2 Read register 1 Read register 2 16 Sign extend Write register Write data Read data Data memory 1 ALU result M u x ALU Zero ID/EX
  • 12. Pipeline Control PC Instruction memory Address Instruction Instruction [20– 16] MemtoReg ALUOp Branch RegDst ALUSrc 4 16 32 Instruction [15– 0] 0 0 Registers Write register Write data Read data 1 Read data 2 Read register 1 Read register 2 Sign extend M u x 1 Write data Read data M u x 1 ALU control RegWrite MemRead Instruction [15– 11] 6 IF/ID ID/EX EX/MEM MEM/WB MemWrite Address Data memory PCSrc Zero Add Add result Shift left 2 ALU result ALU Zero Add 0 1 M u x 0 1 M u x
  • 13. • We have 5 stages. What needs to be controlled in each stage? – Instruction Fetch and PC Increment – Instruction Decode / Register Fetch – Execution (4 lines) • RegDst • ALUop[1:0] • ALUSrc – Memory Stage (3 lines) • Branch • MemRead • MemWrite – Write Back (2 lines) • MemtoReg • RegWrite (note that this signal is in ID stage) Pipeline control
  • 14. • Extend pipeline registers to include control information (created in ID) • Pass control signals along just like the data Pipeline Control Execution/Address Calculation stage control lines Memory access stage control lines Write-back stage control lines Instruction Reg Dst ALU Op1 ALU Op0 ALU Src Branch Mem Read Mem Write Reg write Mem to Reg R-format 1 1 0 0 0 0 0 1 0 lw 0 0 0 1 0 1 0 1 1 sw X 0 0 1 0 0 1 0 X beq X 0 1 0 1 0 0 0 X Control EX M WB M WB WB IF/ID ID/EX EX/MEM MEM/WB Instruction
  • 15. Datapath with Control PC Instruction memory Instruction Add Instruction [20– 16] MemtoReg ALUOp Branch RegDst ALUSrc 4 16 32Instruction [15– 0] 0 0 M u x 0 1 Add Add result Registers Write register Write data Read data 1 Read data 2 Read register 1 Read register 2 Sign extend M u x 1 ALU result Zero Write data Read data M u x 1 ALU control Shift left 2RegWrite MemRead Control ALU Instruction [15– 11] 6 EX M WB M WB WB IF/ID PCSrc ID/EX EX/MEM MEM/WB M u x 0 1 MemWrite Address Data memory Address
  • 16. Datapath with Control PC Instruction memory Instruction Add Instruction [20– 16] MemtoReg ALUOp Branch RegDst ALUSrc 4 16 32Instruction [15– 0] 0 0 M u x 0 1 Add Add result Registers Write register Write data Read data 1 Read data 2 Read register 1 Read register 2 Sign extend M u x 1 ALU result Zero Write data Read data M u x 1 ALU control Shift left 2RegWrite MemRead Control ALU Instruction [15– 11] 6 EX M WB M WB WB IF/ID PCSrc ID/EX EX/MEM MEM/WB M u x 0 1 MemWrite Address Data memory Address IF: lw $10, 8($1)IF: lw $10, 8($1)
  • 17. Datapath with Control PC Instruction memory Instruction Add Instruction [20– 16] MemtoReg ALUOp Branch RegDst ALUSrc 4 16 32Instruction [15– 0] 0 0 M u x 0 1 Add Add result Registers Write register Write data Read data 1 Read data 2 Read register 1 Read register 2 Sign extend M u x 1 ALU result Zero Write data Read data M u x 1 ALU control Shift left 2RegWrite MemRead Control ALU Instruction [15– 11] 6 X M WB M WB WB IF/ID PCSrc ID/EX EX/MEM MEM/WB M u x 0 1 MemWrite Address Data memory Address IF: sub $11, $2, $3IF: sub $11, $2, $3 ID: lw $10, 8($1)ID: lw $10, 8($1) 11 010 0001 E “lw”
  • 18. Datapath with Control PC Instruction memory Instruction Add Instruction [20– 16] MemtoReg ALUOp Branch RegDst ALUSrc 4 16 32Instruction [15– 0] 0 0 M u x 0 1 Add Add result Registers Write register Write data Read data 1 Read data 2 Read register 1 Read register 2 Sign extend M u x 1 ALU result Zero Write data Read data M u x 1 ALU control Shift left 2RegWrite MemRead Control ALU Instruction [15– 11] 6 X M WB M WB WB IF/ID PCSrc ID/EX EX/MEM MEM/WB M u x 0 1 MemWrite Address Data memory Address 11 010 00E ID: sub $11, $2, $3ID: sub $11, $2, $3 EX: lw $10, 8($1)EX: lw $10, 8($1)IF: and $12, $4, $5IF: and $12, $4, $5 1 0 10 000 1100 “sub”
  • 19. Datapath with Control PC Instruction memory Instruction Add Instruction [20– 16] MemtoReg ALUOp Branch RegDst ALUSrc 4 16 32Instruction [15– 0] 0 0 M u x 0 1 Add Add result Registers Write register Write data Read data 1 Read data 2 Read register 1 Read register 2 Sign extend M u x 1 ALU result Zero Write data Read data M u x 1 ALU control Shift left 2RegWrite MemRead Control ALU Instruction [15– 11] 6 X M WB M WB WB IF/ID PCSrc ID/EX EX/MEM MEM/WB M u x 0 1 MemWrite Address Data memory Address 10 000 10E EX: sub $11, $2, $3EX: sub $11, $2, $3 MEM: lw $10, 8($1)MEM: lw $10, 8($1)ID: and $12, $4, $5ID: and $12, $4, $5 0 1 10 000 1100 IF: or $13, $6, $7IF: or $13, $6, $7 11 0 1 0 “and”
  • 20. Datapath with Control PC Instruction memory Instruction Add Instruction [20– 16] MemtoReg ALUOp Branch RegDst ALUSrc 4 16 32Instruction [15– 0] 0 0 M u x 0 1 Add Add result Registers Write register Write data Read data 1 Read data 2 Read register 1 Read register 2 Sign extend M u x 1 ALU result Zero Write data Read data M u x 1 ALU control Shift left 2RegWrite MemRead Control ALU Instruction [15– 11] 6 X M WB M WB WB IF/ID PCSrc ID/EX EX/MEM MEM/WB M u x 0 1 MemWrite Address Data memory Address 10 000 10E MEM: sub $11, ..MEM: sub $11, .. WB: lw $10,WB: lw $10, 8($1)8($1) EX: and $12, $4, $5EX: and $12, $4, $5 0 1 10 000 1100 ID: or $13, $6, $7ID: or $13, $6, $7 10 0 0 0 “or” IF: add $14, $8, $9IF: add $14, $8, $9 1 1
  • 21. Datapath with Control PC Instruction memory Instruction Add Instruction [20– 16] MemtoReg ALUOp Branch RegDst ALUSrc 4 16 32Instruction [15– 0] 0 0 M u x 0 1 Add Add result Registers Write register Write data Read data 1 Read data 2 Read register 1 Read register 2 Sign extend M u x 1 ALU result Zero Write data Read data M u x 1 ALU control Shift left 2RegWrite MemRead Control ALU Instruction [15– 11] 6 X M WB M WB WB IF/ID PCSrc ID/EX EX/MEM MEM/WB M u x 0 1 MemWrite Address Data memory Address 10 000 10E WB: sub $11, ..WB: sub $11, ..MEM: and $12…MEM: and $12… 0 1 10 000 1100 EX: or $13, $6, $7EX: or $13, $6, $7 10 0 0 0 “add” ID: add $14, $8, $9ID: add $14, $8, $9 1 0 IF: xxxxIF: xxxx
  • 22. Datapath with Control PC Instruction memory Instruction Add Instruction [20– 16] MemtoReg ALUOp Branch RegDst ALUSrc 4 16 32Instruction [15– 0] 0 0 M u x 0 1 Add Add result Registers Write register Write data Read data 1 Read data 2 Read register 1 Read register 2 Sign extend M u x 1 ALU result Zero Write data Read data M u x 1 ALU control Shift left 2RegWrite MemRead Control ALU Instruction [15– 11] 6 M WB WB IF/ID PCSrc EX/MEM MEM/WB M u x 0 1 MemWrite Address Data memory Address 10 000 10 WB: and $12…WB: and $12… 0 1 MEM: or $13, ..MEM: or $13, .. 10 0 0 0 EX: add $14, $8, $9EX: add $14, $8, $9 1 0 IF: xxxxIF: xxxx ID: xxxxID: xxxx X M WB ID/EX E
  • 23. Datapath with Control WB: or $13…WB: or $13… PC Instruction memory Instruction Add Instruction [20– 16] MemtoReg ALUOp Branch RegDst ALUSrc 4 16 32Instruction [15– 0] 0 0 M u x 0 1 Add Add result Registers Write register Write data Read data 1 Read data 2 Read register 1 Read register 2 Sign extend M u x 1 ALU result Zero Write data Read data M u x 1 ALU control Shift left 2RegWrite MemRead Control ALU Instruction [15– 11] 6 M WB WB IF/ID PCSrc EX/MEM MEM/WB M u x 0 1 MemWrite Address Data memory Address MEM: add $14, ..MEM: add $14, .. 10 0 0 0 EX: xxxxEX: xxxx 1 0 IF: xxxxIF: xxxx ID: xxxxID: xxxx X M WB ID/EX E
  • 24. Datapath with Control PC Instruction memory Instruction Add Instruction [20– 16] MemtoReg ALUOp Branch RegDst ALUSrc 4 16 32Instruction [15– 0] 0 0 M u x 0 1 Add Add result Registers Write register Write data Read data 1 Read data 2 Read register 1 Read register 2 Sign extend M u x 1 ALU result Zero Write data Read data M u x 1 ALU control Shift left 2RegWrite MemRead Control ALU Instruction [15– 11] 6 M WB WB IF/ID PCSrc EX/MEM MEM/WB M u x 0 1 MemWrite Address Data memory Address WB: add $14..WB: add $14..MEM: xxxxMEM: xxxxEX: xxxxEX: xxxx 1 0 IF: xxxxIF: xxxx ID: xxxxID: xxxx X M WB ID/EX E
  • 25. Pipelining is not quite that straightforward ! • Limits to pipelining: Hazards prevent next instruction from executing during its designated clock cycle – Structural hazards: HW cannot support this combination of instructions – Data hazards: Instruction depends on result of prior instruction still in the pipeline – Control hazards: Caused by delay between the fetching of instructions and decisions about changes in control flow (branches and jumps).
  • 26. “Single” Memory Port Time (clock cycles) Load Instr 1 Instr 2 Instr 3 Instr 4 Cycle 1 Cycle 2 Cycle 3 Cycle 4 Cycle 6 Cycle 7Cycle 5 Reg ALU RegIfetch DMem Reg ALU RegIfetch DMem Reg ALU RegIfetch DMem Reg ALU RegIfetch DMem Reg ALU RegIfetch DMem Instructionorder
  • 27. “Single” Memory Port / Structural Hazard Time (clock cycles) Load add Instr 2 Instr 3 Instr 4 Cycle 1 Cycle 2 Cycle 3 Cycle 4 Cycle 6 Cycle 7Cycle 5 Reg ALU RegIfetch DMem Reg ALU RegIfetch DMem Reg ALU RegIfetch DMem Reg ALU RegIfetch DMem Reg ALU RegIfetch DMem Instructionorder
  • 28. Data Hazard add r1,r2,r3 sub r4,r1,r3 and r6,r1,r7 or r8,r1,r9 xor r10,r1,r11 Reg ALU DMemIfetch Reg Reg ALU DMemIfetch Reg Reg ALU DMemIfetch Reg Reg ALU DMemIfetch Reg Reg ALU DMemIfetch Reg Time (clock cycles) Instructionorder
  • 29. Forwarding to Avoid Data Hazard Time (clock cycles) add r1,r2,r3 sub r4,r1,r3 and r6,r1,r7 or r8,r1,r9 xor r10,r1,r11 Reg ALU DMemIfetch Reg Reg ALU DMemIfetch Reg Reg ALU DMemIfetch Reg Reg ALU DMemIfetch Reg Reg ALU DMemIfetch Reg Instructionorder
  • 34. Forwarding (operand propagation) ALU Data Memory Register File MUX ID/EX EX/MEM MEM/WB MUXMUX Forwarding Unit Rt Rs MUX Rd Rt EX/MEM Rd MEM/WB Rd
  • 35. Data Hazard Even with Forwarding Time (clock cycles) lw r1, 0(r2) sub r4,r1,r6 and r6,r1,r7 or r8,r1,r9 Reg ALU DMemIfetch Reg Reg ALU DMemIfetch Reg Reg ALU DMemIfetch Reg Reg ALU DMemIfetch Reg Forward backward in time… no way!! (or way?) Instructionorder
  • 36. Data Hazard Even with Forwarding Time (clock cycles) or r8,r1,r9 lw r1, 0(r2) sub r4,r1,r6 and r6,r1,r7 Reg ALU DMemIfetch Reg RegIfetch ALU DMem RegBubble Ifetch ALU DMem RegBubble Reg Ifetch ALU DMemBubble Reg Need “pipeline interlock” (or stall) to stop instructions from issuing. How is this detected? NO ISSUE NO ISSUE Instructionorder
  • 37. Hazard Detection Unit • Stall by letting an instruction that won’t write anything go forward • Stall the pipeline if ID/EX is a load, and (rt=IF/ID.rs or rt=IF/ID.rt) PC Instruction memory Registers M u x M u x M u x Control ALU EX M WB M WB WB ID/EX EX/MEM MEM/WB Data memory M u x Hazard detection unit Forwarding unit 0 M u x IF/ID Instruction ID/EX.MemRead IF/IDWrite PCWrite ID/EX.RegisterRt IF/ID.RegisterRd IF/ID.RegisterRt IF/ID.RegisterRt IF/ID.RegisterRs Rt Rs Rd Rt EX/MEM.RegisterRd MEM/WB.RegisterRd
  • 38. Code Rescheduling to Avoid Load Hazards Try producing fast code for a = b + c; d = e – f; assuming a, b, c, d ,e, and f in memory. Slow code: LW Rb,b LW Rc,c ADD Ra,Rb,Rc SW a,Ra LW Re,e LW Rf,f SUB Rd,Re,Rf SW d,Rd Compiler optimizes for performance. Hardware checks for safety. Fast code: LW Rb,b LW Rc,c LW Re,e ADD Ra,Rb,Rc LW Rf,f SW a,Ra SUB Rd,Re,Rf SW d,Rd
  • 39. Control Hazard due to Branches (3 stall cycles) 10: beq r1,r3,36 14: and r2,r3,r5 18: or r6,r1,r7 22: add r8,r1,r9 36: xor r10,r1,r11 Reg ALU DMemIfetch Reg Reg ALU DMemIfetch Reg Reg ALU DMemIfetch Reg Reg ALU DMemIfetch Reg Reg ALU DMemIfetch What do you do with the 3 instructions in between? How do you do it? Where is the “commit”?
  • 40. Branch Hazard Resolutions #1 Stall until branch direction is clear () #2: Static Branch Prediction • Predict Not Taken (Fall through, as shown in previous slide) – Execute successor instructions in sequence – “Squash” instructions in pipeline if branch actually taken – PC+4 already calculated, so use it to get next instruction • Predict Branch Taken – But haven’t calculated branch target address • Might incur 1 cycle branch penalty • Other machines: branch target known before outcome #3 Dynamic Branch Prediction – Will dedicate a lecture to such techniques
  • 41. Alternative Branch Hazard Resolutions #4 Delayed Branch – Define branch to take place AFTER a following instruction branch instruction sequential successor1 sequential successor2 ........ sequential successorn branch target if taken – 1 slot delay allows proper decision and branch target address in 5 stage pipeline (next page) Branch delay of length n
  • 42. Filling Branch Delay Slot Make sure R7 will not be used in taken path before redefined
  • 43. Other Pipelining Issues • To have all instructions finish within one cycle – Slow down frequency to cope w/ the critical operation, or – Allow non-uniform latency operation
  • 44. Support Multiple FP Operations • Complicate bypass • Potential structural hazard • Multiple (FP) instructions can complete at the same time – RF might need to be multi-ported – Ordering issue, who gets to update the register? IFIF IDID MEMMEM WBWB E X E X M 2 M 2 M 3 M 3 M 4 M 4 M 1 M 1 M 5 M 5 M 6 M 6 M 7 M 7 A 2 A 2 A 3 A 3 A 4 A 4 A 1 A 1 Integer Unit FP multiplier FP add FP divider (non-pipelined)
  • 45. Full Bypass/Forwarding Needed IF ID EX M WB IF ID SS M WBM1 M2 M3 M4 M5 M6 M7 L.D F4,0(R2) MUL.D F0,F4,F6 A4IF SS ID SS SS SS SS SS SS A1 A2 A3 M WB IF SS SS SS SS SS SS ID EX SS SS SS M WB ADD.D F2,F0,F8 S.D F2,0(R2) 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18Clock Cycles
  • 46. Structural Hazards • Write to register file at the same cycle (cc11) • Write to the same register (WAW) • MEM in cc10 1 2 3 4 5 6 7 8 9 10 11Clock Cycles IF ID M WBM1 M2 M3 M4 M5 M6 M7MUL.D F0,F4,F6 A4IF ID A1 A2 A3 M WBADD.D F2,F4,F6 IF ID EX M WBL.D F2,0(R2) IF ID EX M WB. . . . IF ID EX M WB. . . . IF ID EX M WB. . . . IF ID EX M WB. . . .
  • 47. Precise Exception Issue • Precise exception: If the pipeline can (or must) be stopped – All the instructions before the faulty (or intended) instruction must be completed – All the instructions after it must not be completed – Restart the execution from the faulty (or intended) instruction • State must be consistent with the original program order • Not straightforward with out-of-order completion • Simple solution: Stalling until no exception of prior long- latency instruction is guaranteed • Other modern solution: ROB (will dedicate a lecture to it) DIV.D F0,F2,F4 (exception!) ADD.D F3,F10,F8 (completed) SUB.D F12,F12,F14 (completed)
  • 48. MIPS R4000 Pipeline • Deeper Pipeline (superpipelining) • 2 cycle delays for load • Predicted-Not-Taken strategy – Not-taken (fall-through) branch : 1 delay slot – Taken branch: 1 delay slot + 2 idle cycles IF IS RF EX DF DS TC WB Instruction Memory Reg Data Memory Reg ALU Branch target and condition eval.
  • 49. Load delay (2 cycles) IF IS RF EX DF DS TC WB Instruction Memory Reg Data Memory Reg ALU Instruction Memory Reg Data Memory Reg ALU Instruction Memory Reg Data Memory Reg ALUInstruction Memory Reg Data Memory Reg ALU CC1 CC2 CC3 CC4 CC5 CC6 CC7 CC8 CC9 CC10 CC11 LD R1 ADD R2, R1 Inst 2 Inst 1 If no delay slot instructions scheduled, R4000 will perform HW interlock Bubble Bubble
  • 50. Branches (Predicted-not-taken) IF IS RF EX DF DS TC WB CC1 CC2 CC3 CC4 CC5 CC6 CC7 CC8 IF IS RF EX DF DS TC WB CC9 CC10 CC11 Branch Delay slot IF IS RF EX DF DS TC WB IF IS RF EX DF DS TC WB Branch inst+2 Branch inst+3 N O T T A K E N SSStall Stall SS SS SS SS SS SS SS SS SS SS SS SS SS SS SS IF IS RF EX DF DS TCBranch Target IF IS RF EX DF DS TC WB IF IS RF EX DF DS TC WB Branch Delay slotT A K E N A C T U A L D I R E C T I O N

Editor's Notes

  • #4: <number>
  • #5: <number>
  • #6: <number>
  • #7: <number>
  • #8: <number>
  • #9: <number>
  • #10: <number>
  • #11: <number>
  • #12: <number>
  • #13: <number>
  • #14: <number>
  • #15: <number>
  • #16: <number>
  • #17: <number>
  • #18: <number>
  • #19: <number>
  • #20: <number>
  • #21: <number>
  • #22: <number>
  • #23: <number>
  • #24: <number>
  • #25: <number>
  • #26: <number>
  • #39: <number>