SlideShare a Scribd company logo
CS 6461: Computer Architecture
Instruction Level Parallelism
Instructor: M. Lancaster
Corresponding to Hennessey and Patterson
Fifth Edition
Section 3.1
January 2013 2
Instruction Level Parallelism
‱ Almost all processors since 1985 use pipelining to overlap
the execution of instructions and improve performance.
This potential overlap among instructions is called
instruction level parallelism
‱ First introduced in the IBM Stretch (Model 7030) in about
1959
‱ Later the CDC 6600 incorporated pipelining and the use of
multiple functional units
‱ The Intel i486 was the first pipelined implementation of
the IA32 architecture
Instruction Level Parallelism
January 2013 3
Instruction Level Parallelism
‱ Instruction level parallel processing is the concurrent
processing of multiple instructions
‱ Difficult to achieve within a basic code block
– Typical MIPS programs have a dynamic branch frequency of
between 15% and 25%
– That is, between three and six instructions execute between a
pair of branches, and data hazards usually exist within these
instructions as they are likely to be dependent
‱ Given basic code block size in number of instructions, ILP
must be exploited across multiple blocks
Instruction Level Parallelism
January 2013 4
Instruction Level Parallelism
‱ The current trend is toward very deep pipelines, increasing
from a depth of < 10 to > 20.
‱ With more stages, each stage can be smaller, more simple
and provide less gate delay, therefore very high clock rates
are possible.
Instruction Level Parallelism
January 2013 5
Loop Level Parallelism
Exploitation among Iterations of a Loop
‱ Loop adding two 1000 element arrays
– Code
for (i=1; i<= 1000; i=i+1)
x[i] = x[i] + y[i];
‱ If we look at the generated code, within a loop there may
be little opportunity for overlap of instructions, but each
iteration of the loop can overlap with any other iteration
Instruction Level Parallelism
January 2013 6
Concepts and Challenges
Approaches to Exploiting ILP
‱ Two major approaches
– Dynamic – these approaches depend upon the hardware to
locate the parallelism
– Static – fixed solutions generated by the compiler, and thus
bound at compile time
‱ These approaches are not totally disjoint, some requiring
both
‱ Limitations are imposed by data and control hazards
Instruction Level Parallelism
January 2013 7
Features Limiting Exploitation of Parallelism
‱ Program features
– Instruction sequences
‱ Processor features
– Pipeline stages and their functions
‱ Interrelationships
– How do program properties limit performance? Under what
circumstances?
Instruction Level Parallelism
January 2013 8
Approaches to Exploiting ILP
Dynamic Approach
‱ Hardware intensive approach
‱ Dominate desktop and server markets
– Pentium III, 4, Athlon
– MIPS R10000/12000
– Sun UltraSPARC III
– PowerPC 603, G3, G4
– Alpha 21264
Instruction Level Parallelism
January 2013 9
Approaches to Exploiting ILP
Static Approach
‱ Compiler intensive approach
‱ Embedded market and IA-64
Instruction Level Parallelism
January 2013 10
Terminology and Ideas
‱ Cycles Per Instruction
– Pipeline CPI = Ideal Pipeline CPI + Structural Stalls + Data Hazard Stalls
+ Control Stalls
‱ Ideal Pipeline CPI is the max that we can achieve in a given
architecture. Stalls and/or their impacts must be minimized.
‱ During 1980s CPI =1 was a target objective for single chip
microprocessors
‱ 1990’s objective: reduce CPI below 1
– Scalar processors are pipelined processors that are designed to fetch and
issue at most one instruction every machine cycle
– Superscalar processors are those that are designed to fetch and issue
multiple instructions every machine cycle
Instruction Level Parallelism
January 2013 11
Approaches to Exploiting ILP
That We Will Explore
Technique Reduces
Forwarding and bypassing Potential data hazards and stalls
Delayed branches and simple branch scheduling Control hazard stalls
Basic dynamic scheduling (scoreboarding) Data hazard stalls from true dependences
Dynamic scheduling with renaming Data hazard stalls and stalls from antidependences and
output dependences
Branch prediction Control stalls
Issuing multiple instructions per cycle Ideal CPI
Hardware Speculation Data hazard and control hazard stalls
Dynamic memory disambiguation Data hazard stalls with memory
Loop unrolling Control hazard stalls
Basic computer pipeline scheduling Data hazard stalls
Compiler dependence analysis, software pipelining, trace
scheduling
Ideal CPI, data hazard stalls
Hardware support for Compiler speculation Ideal CPI, data, control stalls.
Instruction Level Parallelism
January 2013 12
Approaches to Exploiting ILP
Review of Terminology
‱ Instruction issue:
– The process of letting an instruction move from the instruction
decode phase (ID) into the instruction execution (EX) phase
‱ Interlock (pipeline interlock, instruction interlock) is the
resolution of pipeline hazards via hardware. Pipeline
interlock hardware must detect all pipeline hazards and
ensure that all dependencies are satisfied
Instruction Level Parallelism
January 2013 13
Data Dependencies and Hazards
‱ How much parallelism exists in a program and how it can
be exploited
‱ If two instructions are parallel, they can execute
simultaneously in a pipeline without causing any stalls
(assuming no structural hazards exist)
‱ There are no dependencies in parallel instructions
‱ If two instructions are not parallel and must be executed in
order, they may often be partially overlapped.
Instruction Level Parallelism
January 2013 14
Pipeline Hazards
‱ Hazards make it necessary to stall the pipeline.
– Some instructions in the pipeline are allowed to proceed while
others are delayed
– For this example pipeline approach, when an instruction is
stalled, all instructions further back in the pipeline are also
stalled
– No new instructions are fetched during the stall
– Instructions issued earlier in the pipeline must continue
Instruction Level Parallelism
January 2013 15
Data Dependencies and Hazards
‱ Data Dependences – an instruction j is data dependent on
instruction i if either of the following holds
– Instruction i produces a result that may be used by instruction
j
– Instruction j is data dependent on instruction k, and
instruction k is data dependent on instruction i – that is, one
instruction is dependent on another if there exists a chain of
dependencies of the first type between two instructions.
Instruction Level Parallelism
January 2013 16
Data Dependencies and Hazards
‱ Data Dependences –
– Code Example
LOOP: L.D F0,0(R1) ;F0=array element
ADD.D F4,F0,F2 ;add scalar in F2
S.D F4,0(R1) ;store result
DADDUI R1,R1,#-8 ;decrement pointer
8
BNE R1,R2,LOOP;
‱ The above dependencies are in floating point data for the first two
arrows, and integer data in the last two instructions
Instruction Level Parallelism
January 2013 17
Data Dependencies and Hazards
‱ Data Dependences –
– Arrows show where order of instructions must be preserved
– If two instructions are dependent, they cannot be
simultaneously executed or be completely overlapped
Instruction Level Parallelism
January 2013 18
Data Dependencies and Hazards
‱ Dependencies are properties of programs
‱ Whether a given dependence results in an actual hazard
being detected and whether that hazard actually causes a
stall are properties of the pipeline organization
Instruction Level Parallelism
January 2013 19
Data Dependencies and Hazards
‱ Hazard created –
– Code Example
DADDUI R1,R1,#-8 ;decrement pointer 8
BNE R1,R2,LOOP ;
‱ When the branch test is moved from EX to ID stage
‱ If test stayed in ID, dependence would not cause a stall
(Branch delay would still be two cycles however)
Instruction Level Parallelism
January 2013 20
Data Dependencies and Hazards
Instruction Level Parallelism
PC
Instruction
memory
Instruction
Add
Instruction
[20– 16]
MemtoReg
ALUOp
Branch
RegDst
ALUSrc
4
16 32
Instruction
[15– 0]
0
0
M
u
x
0
1
Add
Add
result
Registers
Write
register
Write
data
Read
data 1
Read
data 2
Read
register 1
Read
register 2
Sign
extend
M
u
x
1
ALU
result
Zero
Write
data
Read
data
M
u
x
1
ALU
control
Shift
left 2
RegWrite
MemRead
Control
ALU
Instruction
[15– 11]
6
EX
M
WB
M
WB
WB
IF/ID
PCSrc
ID/EX
EX/MEM
MEM/WB
M
u
x
0
1
MemWrite
Address
Data
memory
Address
PC
Instruction
memory
4
Registers
M
u
x
M
u
x
M
u
x
ALU
EX
M
WB
M
WB
WB
ID/EX
0
EX/MEM
MEM/WB
Data
memory
M
u
x
Hazard
detection
unit
Forwarding
unit
IF.Flush
IF/ID
Sign
extend
Control
M
u
x
=
Shift
left 2
M
u
x
Branch destination and test known at end
of third cycle of execution
Branch destination and test known at
end of second cycle of execution
January 2013 21
Data Dependencies and Hazards
‱ Presence of dependence indicates a potential for a hazard, but
the actual hazard and the length of any stall is a property of the
pipeline.
‱ Data dependence
– Indicates possibility of stall
– Determines the order in which results are calculated
– Sets an upper bound on how much parallelism can be possibly
exploited.
‱ We will focus on overcoming these limitation
Instruction Level Parallelism
January 2013 22
Overcoming Dependences
‱ Two Ways
1. Maintain dependence but avoid the hazard
– Schedule the code dynamically
2. Transform the code
Instruction Level Parallelism
January 2013 23
Difficulty in Detecting Dependences
‱ A data value may flow between instructions either through
registers or through memory locations
‱ Therefore, detection is not always straightforward
– For instructions referring to memory, the register dependences
are easy to detect
– Suppose however we have
R4 = 20 and R6 = 100 and we use 100(R4) and 20(R6)
– Suppose we have incremented R4 in an instruction between
two references (say 20(R4) ) that look identical
Instruction Level Parallelism
January 2013 24
Name Dependences; Two Categories
‱ Two instructions use the same register or memory location,
called a name, but there is actually no flow of data between
the instructions associated with that name. In cases where i
precedes j.
– 1. An antidependence between instructions i and j occurs when
instruction j writes a register or memory location that instruction i
reads. The original ordering must be preserved
– 2. An output dependence occurs when instruction i and instruction
j write the same register or memory location, the order again must
be preserved
Instruction Level Parallelism
January 2013 25
Name Dependences; Two Categories
‱ 1. An antidependence
– i DADD R1,R2.#-8
– j DADD R2,R5,0
‱ 2. An output dependence
– i DADD R1,R2.#-8
– j DADD R1,R4,#10
Instruction Level Parallelism
January 2013 26
Name Dependences
‱ Not true data dependencies, and therefore we could execute
them simultaneously or reorder them if the name (register or
memory location) used in the instructions is changed so that
the instructions do not conflict
‱ Register renaming is easier
– i DADD R1,R2,#-8
– j DADD R2,R4,#10
i DADD R1,R2,#-8
– j DADD R5,R4,#10
Instruction Level Parallelism
January 2013 27
Data Hazards
‱ A hazard is created whenever there is a dependence between
instructions, and they are close enough that the overlap caused
by pipelining or other reordering of instructions would change
the order of access to the operand involved in the dependence.
‱ We must preserve program order; the order the instructions
would execute if executed in a non-pipelined system
‱ However, program order only need be maintained where it
affects the outcome of the program
Instruction Level Parallelism
January 2013 28
Data Hazards – Three Types
‱ Two instructions i and j, with i occurring before j in program
order, possible hazards are:
– RAW (read after write) – j tries to read a source before i writes it,
so j incorrectly gets the old value
‱ The most common type
‱ Program order must be preserved
‱ In a simple common static pipeline a load instruction followed by an
integer ALU instruction that directly uses the load result will lead to a
RAW hazard
Instruction Level Parallelism
January 2013 29
Data Hazards – Three Types
‱ Second type:
– WAW (write after write) – j tries to write an operand before it is
written by i, with the writes ending up in the wrong order, leaving
value written by i
‱ Output dependence
‱ Present in pipelines that write in more than one pipe or allow an
instruction to proceed even when a previous instruction is stalled
‱ In the classic example, WB stage is used for write back, this class of
hazards avoided.
‱ If reordering of instructions is allowed this is a possible hazard
‱ Suppose an integer instruction writes to a register after a floating
point instruction does
Instruction Level Parallelism
January 2013 30
Data Hazards – Three Types
‱ Third type:
– WAR (write after read) – j tries to write an operand before it is
read by i, so i incorrectly gets the new value.
‱ Antidependence
‱ Cannot occur in most static pipelines – note that reads are early in ID
and writes late in WB
Instruction Level Parallelism
January 2013 31
Control Dependencies
‱ Determines ordering of instruction, i with respect to a branch
instruction so that the instruction i is executed in the correct
program order and only when it should be.
‱ Example
– if p1 {
S1;
};
if p2 {
S2;
}
Instruction Level Parallelism
January 2013 32
Control Dependencies
‱ Example
– if p1 {
S1;
};
if p2 {
S2;
}
‱ S1 is control dependent on p1 and S2 is control dependent on
P2 but not on P1
Instruction Level Parallelism
January 2013 33
Control Dependencies
‱ Two constraints imposed
– An instruction that is control dependent on a branch cannot be moved
before the branch so that its execution is no longer controlled by the
branch. For example we cannot take a statement from the then portion of
an if statement and move it before the if statement.
– An instruction that is not control dependent on a branch cannot be moved
after the branch so that the execution is controlled by the branch. For
example, we cannot take a statement before the if and move it into the then
portion
if p1 {
S1;
};
if p2 {
S2;
}
Instruction Level Parallelism
January 2013 34
Control Dependencies
‱ Two properties of our simple pipeline preserve control
dependencies
– Instructions execute in program order
– Detection of control or branch hazards ensures that an instruction
that is control dependent on a branch is not executed until the
branch direction is known
‱ We can introduce instructions that should not have been
executed (violating control dependences) if we can do so
without affecting the correctness of the program
Instruction Level Parallelism
January 2013 35
Control Dependencies are Really

‱ Not the issue; Really the issue is the preservation of
– Exception behavior
– Data flow
Instruction Level Parallelism
January 2013 36
Preserving Exception Behavior
‱ Preserving exception behavior means that any changes in the ordering of
instruction execution must not change how exceptions are raised in the
program
– We may relax this rule and say that reordering of instruction execution must
not cause any new exceptions
DADDU R2,R3,R4
BEQZ R2, L1
LW R1,0(R2) ;Could cause illegal mem acc
L1: 

– In the above, if we do not maintain the data dependence of R2, we may change
the program. If we ignore the control dependency and move the load
instruction before the branch, the load instruction may cause a memory
protection exception
– There is no visible data dependence that prevents this interchange, only
control dependence
Instruction Level Parallelism
January 2013 37
Preserving Exception Behavior
‱ To allow reordering of these instructions (which as we said
preserves data dependence) we would like to just ignore the
exception.
Instruction Level Parallelism
January 2013 38
Preserving Data Flow
‱ This means preserving the actual flow of data values between
instructions that produce results and those that consume them.
‱ Branches make data flow dynamic, since they allow the source
of data for a given instruction to come from many points
Instruction Level Parallelism
January 2013 39
Preserving Data Flow
‱ Example
DADDU R1,R2,R3
BEQZ R4,L
DSUBU R1,R5,R6
L: 

OR R7,R1,R8 ; depends on branch
taken
– Cannot move DSUBU above branch
‱ By preserving the control dependence of the OR on the branch
we prevent an illegal change to the data flow
Instruction Level Parallelism
January 2013 40
Preserving Data Flow
‱ Sometimes violating the control dependence cannot affect either the
exception behavior or the data flow
DADDU R1,R2,R3
BEQZ R1,skip
DSUBU R4,R5,R6
DADDU R5,R4,R9
skip: OR R7,R1,R8 ; suppose R4 not used after here
– If R4 unused after this point, changing the value of R4 just before the
branch would not affect data flow
– If R4 were dead and DSUBU could not generate an exception* we could
move the DSUBU instruction before the branch
– This is called speculation since compiler is betting on branch outcome
Instruction Level Parallelism
January 2013 41
Control Dependence Again
‱ Control dependence in the simple pipeline is preserved by
implementing control and hazard detection that can cause
control stalls
‱ Can be eliminated by a variety of hardware techniques
‱ Delayed branches can reduce stalls arising from control
hazards, but requires that the compiler preserve data flow
Instruction Level Parallelism

More Related Content

Similar to computer architecture module3 notes module (20)

PDF
Mr201304 open flow_security_eng
FFRI, Inc.
 
PPT
advanced computer architesture-conditions of parallelism
Pankaj Kumar Jain
 
PPT
Pipelining & All Hazards Solution
.AIR UNIVERSITY ISLAMABAD
 
PDF
PLC Training in Noida | PLC Scada Training in Delhi
Sofcon India PVT LTD
 
PDF
9Tuts.Com New CCNA 200-120 New CCNA New Questions 2
Lori Head
 
PPTX
Instruction Level Parallelism – Compiler Techniques
Dilum Bandara
 
PDF
Profibus commissioning and maintenance - Richard Needham - oct 2015
PROFIBUS and PROFINET InternationaI - PI UK
 
PDF
Practical DNP3 and Modern SCADA Systems
Living Online
 
PDF
Advanced Techniques for Exploiting ILP
Dr. A. B. Shinde
 
PDF
Design and Analysis of A 32-bit Pipelined MIPS Risc Processor
VLSICS Design
 
PDF
DESIGN AND ANALYSIS OF A 32-BIT PIPELINED MIPS RISC PROCESSOR
VLSICS Design
 
PDF
DESIGN AND ANALYSIS OF A 32-BIT PIPELINED MIPS RISC PROCESSOR
VLSICS Design
 
PDF
Topic2a ss pipelines
turki_09
 
PDF
PLCProgramming for beginners in automation
dhruvakumarkEC032
 
PDF
Performance Evaluation of Source Routing over MPLS Networks for Failure Detec...
Eswar Publications
 
PPTX
PLC and SCADA communication
Talha Shaikh
 
PPT
Chapter 2 pc
Hanif Durad
 
PPT
2. ILP Processors.ppt
ShifaZahra7
 
PDF
An Enhanced Technique for Network Traffic Classification with unknown Flow De...
IRJET Journal
 
PDF
Computer SAarchitecture Lecture 6_Pip.pdf
kimhyunwoo24
 
Mr201304 open flow_security_eng
FFRI, Inc.
 
advanced computer architesture-conditions of parallelism
Pankaj Kumar Jain
 
Pipelining & All Hazards Solution
.AIR UNIVERSITY ISLAMABAD
 
PLC Training in Noida | PLC Scada Training in Delhi
Sofcon India PVT LTD
 
9Tuts.Com New CCNA 200-120 New CCNA New Questions 2
Lori Head
 
Instruction Level Parallelism – Compiler Techniques
Dilum Bandara
 
Profibus commissioning and maintenance - Richard Needham - oct 2015
PROFIBUS and PROFINET InternationaI - PI UK
 
Practical DNP3 and Modern SCADA Systems
Living Online
 
Advanced Techniques for Exploiting ILP
Dr. A. B. Shinde
 
Design and Analysis of A 32-bit Pipelined MIPS Risc Processor
VLSICS Design
 
DESIGN AND ANALYSIS OF A 32-BIT PIPELINED MIPS RISC PROCESSOR
VLSICS Design
 
DESIGN AND ANALYSIS OF A 32-BIT PIPELINED MIPS RISC PROCESSOR
VLSICS Design
 
Topic2a ss pipelines
turki_09
 
PLCProgramming for beginners in automation
dhruvakumarkEC032
 
Performance Evaluation of Source Routing over MPLS Networks for Failure Detec...
Eswar Publications
 
PLC and SCADA communication
Talha Shaikh
 
Chapter 2 pc
Hanif Durad
 
2. ILP Processors.ppt
ShifaZahra7
 
An Enhanced Technique for Network Traffic Classification with unknown Flow De...
IRJET Journal
 
Computer SAarchitecture Lecture 6_Pip.pdf
kimhyunwoo24
 

More from thirugnanasambandham4 (6)

PPTX
Sql and plsql syntax and commands aaaaaa
thirugnanasambandham4
 
PPTX
Artificial intelligence - chapter 2 problems, problem spaces, and search
thirugnanasambandham4
 
PPTX
unit4- predicate logic in artificial intelligence
thirugnanasambandham4
 
PDF
1 notes Population Explosion-Status and Reasons.pdf
thirugnanasambandham4
 
PDF
7_General Studies note for Tnpsc examination
thirugnanasambandham4
 
PPTX
BACKTRAKING TECHNIQUES IN ANALYTICS OF ALGORITHM
thirugnanasambandham4
 
Sql and plsql syntax and commands aaaaaa
thirugnanasambandham4
 
Artificial intelligence - chapter 2 problems, problem spaces, and search
thirugnanasambandham4
 
unit4- predicate logic in artificial intelligence
thirugnanasambandham4
 
1 notes Population Explosion-Status and Reasons.pdf
thirugnanasambandham4
 
7_General Studies note for Tnpsc examination
thirugnanasambandham4
 
BACKTRAKING TECHNIQUES IN ANALYTICS OF ALGORITHM
thirugnanasambandham4
 
Ad

Recently uploaded (20)

PDF
UNIT-4-FEEDBACK AMPLIFIERS AND OSCILLATORS (1).pdf
Sridhar191373
 
PDF
ARC--BUILDING-UTILITIES-2-PART-2 (1).pdf
IzzyBaniquedBusto
 
PPTX
site survey architecture student B.arch.
sri02032006
 
PPTX
Pharmaceuticals and fine chemicals.pptxx
jaypa242004
 
PPTX
REINFORCEMENT AS CONSTRUCTION MATERIALS.pptx
mohaiminulhaquesami
 
PPTX
MPMC_Module-2 xxxxxxxxxxxxxxxxxxxxx.pptx
ShivanshVaidya5
 
PPTX
Types of Bearing_Specifications_PPT.pptx
PranjulAgrahariAkash
 
PPTX
artificial intelligence applications in Geomatics
NawrasShatnawi1
 
PDF
Additional Information in midterm CPE024 (1).pdf
abolisojoy
 
PPTX
The Role of Information Technology in Environmental Protectio....pptx
nallamillisriram
 
PPTX
Heart Bleed Bug - A case study (Course: Cryptography and Network Security)
Adri Jovin
 
PDF
monopile foundation seminar topic for civil engineering students
Ahina5
 
PDF
POWER PLANT ENGINEERING (R17A0326).pdf..
haneefachosa123
 
PPTX
Introduction to Neural Networks and Perceptron Learning Algorithm.pptx
Kayalvizhi A
 
PDF
IoT - Unit 2 (Internet of Things-Concepts) - PPT.pdf
dipakraut82
 
PPTX
Benefits_^0_Challigi😙🏡💐8fenges[1].pptx
akghostmaker
 
PPTX
MobileComputingMANET2023 MobileComputingMANET2023.pptx
masterfake98765
 
PDF
Ethics and Trustworthy AI in Healthcare – Governing Sensitive Data, Profiling...
AlqualsaDIResearchGr
 
PDF
PRIZ Academy - Change Flow Thinking Master Change with Confidence.pdf
PRIZ Guru
 
PPTX
Electron Beam Machining for Production Process
Rajshahi University of Engineering & Technology(RUET), Bangladesh
 
UNIT-4-FEEDBACK AMPLIFIERS AND OSCILLATORS (1).pdf
Sridhar191373
 
ARC--BUILDING-UTILITIES-2-PART-2 (1).pdf
IzzyBaniquedBusto
 
site survey architecture student B.arch.
sri02032006
 
Pharmaceuticals and fine chemicals.pptxx
jaypa242004
 
REINFORCEMENT AS CONSTRUCTION MATERIALS.pptx
mohaiminulhaquesami
 
MPMC_Module-2 xxxxxxxxxxxxxxxxxxxxx.pptx
ShivanshVaidya5
 
Types of Bearing_Specifications_PPT.pptx
PranjulAgrahariAkash
 
artificial intelligence applications in Geomatics
NawrasShatnawi1
 
Additional Information in midterm CPE024 (1).pdf
abolisojoy
 
The Role of Information Technology in Environmental Protectio....pptx
nallamillisriram
 
Heart Bleed Bug - A case study (Course: Cryptography and Network Security)
Adri Jovin
 
monopile foundation seminar topic for civil engineering students
Ahina5
 
POWER PLANT ENGINEERING (R17A0326).pdf..
haneefachosa123
 
Introduction to Neural Networks and Perceptron Learning Algorithm.pptx
Kayalvizhi A
 
IoT - Unit 2 (Internet of Things-Concepts) - PPT.pdf
dipakraut82
 
Benefits_^0_Challigi😙🏡💐8fenges[1].pptx
akghostmaker
 
MobileComputingMANET2023 MobileComputingMANET2023.pptx
masterfake98765
 
Ethics and Trustworthy AI in Healthcare – Governing Sensitive Data, Profiling...
AlqualsaDIResearchGr
 
PRIZ Academy - Change Flow Thinking Master Change with Confidence.pdf
PRIZ Guru
 
Electron Beam Machining for Production Process
Rajshahi University of Engineering & Technology(RUET), Bangladesh
 
Ad

computer architecture module3 notes module

  • 1. CS 6461: Computer Architecture Instruction Level Parallelism Instructor: M. Lancaster Corresponding to Hennessey and Patterson Fifth Edition Section 3.1
  • 2. January 2013 2 Instruction Level Parallelism ‱ Almost all processors since 1985 use pipelining to overlap the execution of instructions and improve performance. This potential overlap among instructions is called instruction level parallelism ‱ First introduced in the IBM Stretch (Model 7030) in about 1959 ‱ Later the CDC 6600 incorporated pipelining and the use of multiple functional units ‱ The Intel i486 was the first pipelined implementation of the IA32 architecture Instruction Level Parallelism
  • 3. January 2013 3 Instruction Level Parallelism ‱ Instruction level parallel processing is the concurrent processing of multiple instructions ‱ Difficult to achieve within a basic code block – Typical MIPS programs have a dynamic branch frequency of between 15% and 25% – That is, between three and six instructions execute between a pair of branches, and data hazards usually exist within these instructions as they are likely to be dependent ‱ Given basic code block size in number of instructions, ILP must be exploited across multiple blocks Instruction Level Parallelism
  • 4. January 2013 4 Instruction Level Parallelism ‱ The current trend is toward very deep pipelines, increasing from a depth of < 10 to > 20. ‱ With more stages, each stage can be smaller, more simple and provide less gate delay, therefore very high clock rates are possible. Instruction Level Parallelism
  • 5. January 2013 5 Loop Level Parallelism Exploitation among Iterations of a Loop ‱ Loop adding two 1000 element arrays – Code for (i=1; i<= 1000; i=i+1) x[i] = x[i] + y[i]; ‱ If we look at the generated code, within a loop there may be little opportunity for overlap of instructions, but each iteration of the loop can overlap with any other iteration Instruction Level Parallelism
  • 6. January 2013 6 Concepts and Challenges Approaches to Exploiting ILP ‱ Two major approaches – Dynamic – these approaches depend upon the hardware to locate the parallelism – Static – fixed solutions generated by the compiler, and thus bound at compile time ‱ These approaches are not totally disjoint, some requiring both ‱ Limitations are imposed by data and control hazards Instruction Level Parallelism
  • 7. January 2013 7 Features Limiting Exploitation of Parallelism ‱ Program features – Instruction sequences ‱ Processor features – Pipeline stages and their functions ‱ Interrelationships – How do program properties limit performance? Under what circumstances? Instruction Level Parallelism
  • 8. January 2013 8 Approaches to Exploiting ILP Dynamic Approach ‱ Hardware intensive approach ‱ Dominate desktop and server markets – Pentium III, 4, Athlon – MIPS R10000/12000 – Sun UltraSPARC III – PowerPC 603, G3, G4 – Alpha 21264 Instruction Level Parallelism
  • 9. January 2013 9 Approaches to Exploiting ILP Static Approach ‱ Compiler intensive approach ‱ Embedded market and IA-64 Instruction Level Parallelism
  • 10. January 2013 10 Terminology and Ideas ‱ Cycles Per Instruction – Pipeline CPI = Ideal Pipeline CPI + Structural Stalls + Data Hazard Stalls + Control Stalls ‱ Ideal Pipeline CPI is the max that we can achieve in a given architecture. Stalls and/or their impacts must be minimized. ‱ During 1980s CPI =1 was a target objective for single chip microprocessors ‱ 1990’s objective: reduce CPI below 1 – Scalar processors are pipelined processors that are designed to fetch and issue at most one instruction every machine cycle – Superscalar processors are those that are designed to fetch and issue multiple instructions every machine cycle Instruction Level Parallelism
  • 11. January 2013 11 Approaches to Exploiting ILP That We Will Explore Technique Reduces Forwarding and bypassing Potential data hazards and stalls Delayed branches and simple branch scheduling Control hazard stalls Basic dynamic scheduling (scoreboarding) Data hazard stalls from true dependences Dynamic scheduling with renaming Data hazard stalls and stalls from antidependences and output dependences Branch prediction Control stalls Issuing multiple instructions per cycle Ideal CPI Hardware Speculation Data hazard and control hazard stalls Dynamic memory disambiguation Data hazard stalls with memory Loop unrolling Control hazard stalls Basic computer pipeline scheduling Data hazard stalls Compiler dependence analysis, software pipelining, trace scheduling Ideal CPI, data hazard stalls Hardware support for Compiler speculation Ideal CPI, data, control stalls. Instruction Level Parallelism
  • 12. January 2013 12 Approaches to Exploiting ILP Review of Terminology ‱ Instruction issue: – The process of letting an instruction move from the instruction decode phase (ID) into the instruction execution (EX) phase ‱ Interlock (pipeline interlock, instruction interlock) is the resolution of pipeline hazards via hardware. Pipeline interlock hardware must detect all pipeline hazards and ensure that all dependencies are satisfied Instruction Level Parallelism
  • 13. January 2013 13 Data Dependencies and Hazards ‱ How much parallelism exists in a program and how it can be exploited ‱ If two instructions are parallel, they can execute simultaneously in a pipeline without causing any stalls (assuming no structural hazards exist) ‱ There are no dependencies in parallel instructions ‱ If two instructions are not parallel and must be executed in order, they may often be partially overlapped. Instruction Level Parallelism
  • 14. January 2013 14 Pipeline Hazards ‱ Hazards make it necessary to stall the pipeline. – Some instructions in the pipeline are allowed to proceed while others are delayed – For this example pipeline approach, when an instruction is stalled, all instructions further back in the pipeline are also stalled – No new instructions are fetched during the stall – Instructions issued earlier in the pipeline must continue Instruction Level Parallelism
  • 15. January 2013 15 Data Dependencies and Hazards ‱ Data Dependences – an instruction j is data dependent on instruction i if either of the following holds – Instruction i produces a result that may be used by instruction j – Instruction j is data dependent on instruction k, and instruction k is data dependent on instruction i – that is, one instruction is dependent on another if there exists a chain of dependencies of the first type between two instructions. Instruction Level Parallelism
  • 16. January 2013 16 Data Dependencies and Hazards ‱ Data Dependences – – Code Example LOOP: L.D F0,0(R1) ;F0=array element ADD.D F4,F0,F2 ;add scalar in F2 S.D F4,0(R1) ;store result DADDUI R1,R1,#-8 ;decrement pointer 8 BNE R1,R2,LOOP; ‱ The above dependencies are in floating point data for the first two arrows, and integer data in the last two instructions Instruction Level Parallelism
  • 17. January 2013 17 Data Dependencies and Hazards ‱ Data Dependences – – Arrows show where order of instructions must be preserved – If two instructions are dependent, they cannot be simultaneously executed or be completely overlapped Instruction Level Parallelism
  • 18. January 2013 18 Data Dependencies and Hazards ‱ Dependencies are properties of programs ‱ Whether a given dependence results in an actual hazard being detected and whether that hazard actually causes a stall are properties of the pipeline organization Instruction Level Parallelism
  • 19. January 2013 19 Data Dependencies and Hazards ‱ Hazard created – – Code Example DADDUI R1,R1,#-8 ;decrement pointer 8 BNE R1,R2,LOOP ; ‱ When the branch test is moved from EX to ID stage ‱ If test stayed in ID, dependence would not cause a stall (Branch delay would still be two cycles however) Instruction Level Parallelism
  • 20. January 2013 20 Data Dependencies and Hazards Instruction Level Parallelism PC Instruction memory Instruction Add Instruction [20– 16] MemtoReg ALUOp Branch RegDst ALUSrc 4 16 32 Instruction [15– 0] 0 0 M u x 0 1 Add Add result Registers Write register Write data Read data 1 Read data 2 Read register 1 Read register 2 Sign extend M u x 1 ALU result Zero Write data Read data M u x 1 ALU control Shift left 2 RegWrite MemRead Control ALU Instruction [15– 11] 6 EX M WB M WB WB IF/ID PCSrc ID/EX EX/MEM MEM/WB M u x 0 1 MemWrite Address Data memory Address PC Instruction memory 4 Registers M u x M u x M u x ALU EX M WB M WB WB ID/EX 0 EX/MEM MEM/WB Data memory M u x Hazard detection unit Forwarding unit IF.Flush IF/ID Sign extend Control M u x = Shift left 2 M u x Branch destination and test known at end of third cycle of execution Branch destination and test known at end of second cycle of execution
  • 21. January 2013 21 Data Dependencies and Hazards ‱ Presence of dependence indicates a potential for a hazard, but the actual hazard and the length of any stall is a property of the pipeline. ‱ Data dependence – Indicates possibility of stall – Determines the order in which results are calculated – Sets an upper bound on how much parallelism can be possibly exploited. ‱ We will focus on overcoming these limitation Instruction Level Parallelism
  • 22. January 2013 22 Overcoming Dependences ‱ Two Ways 1. Maintain dependence but avoid the hazard – Schedule the code dynamically 2. Transform the code Instruction Level Parallelism
  • 23. January 2013 23 Difficulty in Detecting Dependences ‱ A data value may flow between instructions either through registers or through memory locations ‱ Therefore, detection is not always straightforward – For instructions referring to memory, the register dependences are easy to detect – Suppose however we have R4 = 20 and R6 = 100 and we use 100(R4) and 20(R6) – Suppose we have incremented R4 in an instruction between two references (say 20(R4) ) that look identical Instruction Level Parallelism
  • 24. January 2013 24 Name Dependences; Two Categories ‱ Two instructions use the same register or memory location, called a name, but there is actually no flow of data between the instructions associated with that name. In cases where i precedes j. – 1. An antidependence between instructions i and j occurs when instruction j writes a register or memory location that instruction i reads. The original ordering must be preserved – 2. An output dependence occurs when instruction i and instruction j write the same register or memory location, the order again must be preserved Instruction Level Parallelism
  • 25. January 2013 25 Name Dependences; Two Categories ‱ 1. An antidependence – i DADD R1,R2.#-8 – j DADD R2,R5,0 ‱ 2. An output dependence – i DADD R1,R2.#-8 – j DADD R1,R4,#10 Instruction Level Parallelism
  • 26. January 2013 26 Name Dependences ‱ Not true data dependencies, and therefore we could execute them simultaneously or reorder them if the name (register or memory location) used in the instructions is changed so that the instructions do not conflict ‱ Register renaming is easier – i DADD R1,R2,#-8 – j DADD R2,R4,#10 i DADD R1,R2,#-8 – j DADD R5,R4,#10 Instruction Level Parallelism
  • 27. January 2013 27 Data Hazards ‱ A hazard is created whenever there is a dependence between instructions, and they are close enough that the overlap caused by pipelining or other reordering of instructions would change the order of access to the operand involved in the dependence. ‱ We must preserve program order; the order the instructions would execute if executed in a non-pipelined system ‱ However, program order only need be maintained where it affects the outcome of the program Instruction Level Parallelism
  • 28. January 2013 28 Data Hazards – Three Types ‱ Two instructions i and j, with i occurring before j in program order, possible hazards are: – RAW (read after write) – j tries to read a source before i writes it, so j incorrectly gets the old value ‱ The most common type ‱ Program order must be preserved ‱ In a simple common static pipeline a load instruction followed by an integer ALU instruction that directly uses the load result will lead to a RAW hazard Instruction Level Parallelism
  • 29. January 2013 29 Data Hazards – Three Types ‱ Second type: – WAW (write after write) – j tries to write an operand before it is written by i, with the writes ending up in the wrong order, leaving value written by i ‱ Output dependence ‱ Present in pipelines that write in more than one pipe or allow an instruction to proceed even when a previous instruction is stalled ‱ In the classic example, WB stage is used for write back, this class of hazards avoided. ‱ If reordering of instructions is allowed this is a possible hazard ‱ Suppose an integer instruction writes to a register after a floating point instruction does Instruction Level Parallelism
  • 30. January 2013 30 Data Hazards – Three Types ‱ Third type: – WAR (write after read) – j tries to write an operand before it is read by i, so i incorrectly gets the new value. ‱ Antidependence ‱ Cannot occur in most static pipelines – note that reads are early in ID and writes late in WB Instruction Level Parallelism
  • 31. January 2013 31 Control Dependencies ‱ Determines ordering of instruction, i with respect to a branch instruction so that the instruction i is executed in the correct program order and only when it should be. ‱ Example – if p1 { S1; }; if p2 { S2; } Instruction Level Parallelism
  • 32. January 2013 32 Control Dependencies ‱ Example – if p1 { S1; }; if p2 { S2; } ‱ S1 is control dependent on p1 and S2 is control dependent on P2 but not on P1 Instruction Level Parallelism
  • 33. January 2013 33 Control Dependencies ‱ Two constraints imposed – An instruction that is control dependent on a branch cannot be moved before the branch so that its execution is no longer controlled by the branch. For example we cannot take a statement from the then portion of an if statement and move it before the if statement. – An instruction that is not control dependent on a branch cannot be moved after the branch so that the execution is controlled by the branch. For example, we cannot take a statement before the if and move it into the then portion if p1 { S1; }; if p2 { S2; } Instruction Level Parallelism
  • 34. January 2013 34 Control Dependencies ‱ Two properties of our simple pipeline preserve control dependencies – Instructions execute in program order – Detection of control or branch hazards ensures that an instruction that is control dependent on a branch is not executed until the branch direction is known ‱ We can introduce instructions that should not have been executed (violating control dependences) if we can do so without affecting the correctness of the program Instruction Level Parallelism
  • 35. January 2013 35 Control Dependencies are Really
 ‱ Not the issue; Really the issue is the preservation of – Exception behavior – Data flow Instruction Level Parallelism
  • 36. January 2013 36 Preserving Exception Behavior ‱ Preserving exception behavior means that any changes in the ordering of instruction execution must not change how exceptions are raised in the program – We may relax this rule and say that reordering of instruction execution must not cause any new exceptions DADDU R2,R3,R4 BEQZ R2, L1 LW R1,0(R2) ;Could cause illegal mem acc L1: 
 – In the above, if we do not maintain the data dependence of R2, we may change the program. If we ignore the control dependency and move the load instruction before the branch, the load instruction may cause a memory protection exception – There is no visible data dependence that prevents this interchange, only control dependence Instruction Level Parallelism
  • 37. January 2013 37 Preserving Exception Behavior ‱ To allow reordering of these instructions (which as we said preserves data dependence) we would like to just ignore the exception. Instruction Level Parallelism
  • 38. January 2013 38 Preserving Data Flow ‱ This means preserving the actual flow of data values between instructions that produce results and those that consume them. ‱ Branches make data flow dynamic, since they allow the source of data for a given instruction to come from many points Instruction Level Parallelism
  • 39. January 2013 39 Preserving Data Flow ‱ Example DADDU R1,R2,R3 BEQZ R4,L DSUBU R1,R5,R6 L: 
 OR R7,R1,R8 ; depends on branch taken – Cannot move DSUBU above branch ‱ By preserving the control dependence of the OR on the branch we prevent an illegal change to the data flow Instruction Level Parallelism
  • 40. January 2013 40 Preserving Data Flow ‱ Sometimes violating the control dependence cannot affect either the exception behavior or the data flow DADDU R1,R2,R3 BEQZ R1,skip DSUBU R4,R5,R6 DADDU R5,R4,R9 skip: OR R7,R1,R8 ; suppose R4 not used after here – If R4 unused after this point, changing the value of R4 just before the branch would not affect data flow – If R4 were dead and DSUBU could not generate an exception* we could move the DSUBU instruction before the branch – This is called speculation since compiler is betting on branch outcome Instruction Level Parallelism
  • 41. January 2013 41 Control Dependence Again ‱ Control dependence in the simple pipeline is preserved by implementing control and hazard detection that can cause control stalls ‱ Can be eliminated by a variety of hardware techniques ‱ Delayed branches can reduce stalls arising from control hazards, but requires that the compiler preserve data flow Instruction Level Parallelism