SlideShare a Scribd company logo
Pipelining understanding:
Pipelining is running multiple stages of the same process in parallel in a way that efficiently uses
all the available hardware while respecting the dependencies of each stage upon the previous
stages. In the laundry example, the stages are washing, drying, and folding. By starting a wash
stage as soon as the previous wash stage is moved to the dryer, the idle time of the washer is
minimized. Notice that the wash stage takes less time than the dry stage, so the wash stage must
remain idle until the dry stage finishes: the steady state throughput of the pipeline is limited by
the slowest stage in the pipeline. This can be mitigated by breaking up the bottleneck stage into
smaller sub-stages. For those less concerned with laundry-based examples, consider a video
game. The CPU computes the keyboard/mouse input each frame and moves the camera
accordingly, then the GPU takes that information and actually renders the scene; meanwhile, the
CPU has already begun calculating what's going to happen in the next frame.
How Pipelining will done:
In class, we mentioned that interpreting each computer instruction is a four step process: fetching
the instruction, decoding it and reading the register, executing it, and recording the results. Each
instruction may take 4 cycles to complete, but if our throughput is one instruction each cycle,
then we would like to perform, on average, $n$ instructions every $n$ cycles. To accomplish
this, we can split up an instruction's work into the 4 different steps so that other pieces of
hardware work to decode, execute, and record results while the CPU performs the fetch. The
latency to process each instruction is fixed at 4 cycles, so by processing a new instruction every
cycle, after four cycles, one instruction has been completed and three are "in progress" (they're
in the pipeline). After many cycles the steady state throughput approaches one completed
instruction every cycle.
An assembly line in a auto manufacturing plant is another good example of a pipelined process.
There are many steps in the assembly of the car, each of which is assigned a stage in the pipeline.
Typically the depth of these pipelines is very large: cars are pretty complex, so there need to be a
lot of stages in the assembly line. The more stages, the longer it takes to crank the system up to a
steady state. The larger the depth, the more costly it is to turn the system around: A branch
misprediction in an instruction pipeline would be like getting one of the steps wrong in the
assembly line: all the cars affected would have to go back to the beginning of the assembly line
and be processed again.
OnLive Example[Realtime]:
OnLive is a company that allows gamers to play video games in the cloud. The games are run on
one of the company's server farms, and video of the game is sent back to your computer. The
idea is that even the lamest of computers can run the most highly intensive games because all the
computer does is send your joystick input over the internet and display the frames it gets back.
Of course, no one wants to play a game with a noticeably low framerate. We're going to
demonstrate how OnLive could deliver a reasonable experience. For our purposes, we'll assume
that OnLive uses a four step process: the user's computer sends over the input to the server
(10ms), the server tells the game about the user's input and then compresses the resulting game
frame (15ms), the compressed video is sent back to the user (60ms) where it is then
decompressed and displayed (15ms). Note that OnLive doesn't share its data, so these numbers
are contrived.
The latency of this process is 100ms (10+15+60+15). This means that there will always be a
tenth of a second lag from when you perform an action to when you see it affect things on the
screen.
Communication between different parts of a machine is not particularly easy to manage since
often it only occurs in burst situations - that is a huge demand on the communication framework
followed by a period of very little activity. Communication can be sped up by pipelining
however. We do not necessarily have to wait for a message to be delivered before we send
another piece of information. Therefore we can set up a level of pipelining. Often, however, the
rate at which we can send messages is much faster than the time it takes data to go through the
slowest part of our system. Therefore, pipelining only helps to an extent because in the long run
our communication is limited by the slowest part of our system.
Data hazards:
Data hazards occur when instructions that exhibit data dependence modify data in different
stages of a pipeline. Ignoring potential data hazards can result in race conditions (also termed
race hazards). There are three situations in which a data hazard can occur:
read after write (RAW), a true dependency
write after read (WAR), an anti-dependency
write after write (WAW), an output dependency
Consider two instructions i1 and i2, with i1 occurring before i2 in program order.
Read after write (RAW):
(i2 tries to read a source before i1 writes to it) A read after write (RAW) data hazard refers to a
situation where an instruction refers to a result that has not yet been calculated or retrieved. This
can occur because even though an instruction is executed after a prior instruction, the prior
instruction has been processed only partly through the pipeline.
For example:
i1. R2 <- R1 + R3
i2. R4 <- R2 + R3
The first instruction is calculating a value to be saved in register R2, and the second is going to
use this value to compute a result for register R4. However, in a pipeline, when operands are
fetched for the 2nd operation, the results from the first will not yet have been saved, and hence a
data dependency occurs.
A data dependency occurs with instruction i2, as it is dependent on the completion of instruction
i1.
Write after write (WAW):
(i2 tries to write an operand before it is written by i1) A write after write (WAW) data hazard
may occur in a concurrent execution environment.
Example:
For example:
i1. R2 <- R4 + R7
i2. R2 <- R1 + R3
The write back (WB) of i2 must be delayed until i1 finishes executing.
Structural hazards:
A structural hazard occurs when a part of the processor's hardware is needed by two or more
instructions at the same time. A canonical example is a single memory unit that is accessed both
in the fetch stage where an instruction is retrieved from memory, and the memory stage where
data is written and/or read from memory.[3] They can often be resolved by separating the
component into orthogonal units (such as separate caches) or bubbling the pipeline.
Control hazards (branch hazards):
Further information: Branch (computer science)
Branching hazards (also termed control hazards) occur with branches. On many instruction
pipeline microarchitectures, the processor will not know the outcome of the branch when it needs
to insert a new instruction into the pipeline.
Forwarding:
The problem with data hazards, introduced by this sequence of instructions can be solved with a
simple hardware technique called forwarding.
1 2 3 4 5 6 7
ADD R1,R2,R3 IF ID EX MEM WB
SUB R4,R5,R1
IF ID SUB EX MEM WB
AND R6,R1,R7
IF ID AND EX MEM WB
The key insight in forwarding is that the result is not really needed by SUB until after the ADD
actually produces it. The only problem is to make it available for SUB when it needs it.
If the result can be moved from where the ADD produces it (EX/MEM register), to where the
SUB needs it (ALU input latch), then the need for a stall can be avoided.
Using this observation , forwarding works as follows:
The ALU result from the EX/MEM register is always fed back to the ALU input latches.
If the forwarding hardware detects that the previous ALU operation has written the register
corresponding to the source for the current ALU operation, control logic selects the forwarded
result as the ALU input rather than the value read from the register file.
Forwarding of results to the ALU requires the additional of three extra inputs on each ALU
multiplexer and the addtion of three paths to the new inputs.
The paths correspond to a forwarding of:
(a) the ALU output at the end of EX,
(b) the ALU output at the end of MEM, and
(c) the memory output at the end of MEM.
Solution
Pipelining understanding:
Pipelining is running multiple stages of the same process in parallel in a way that efficiently uses
all the available hardware while respecting the dependencies of each stage upon the previous
stages. In the laundry example, the stages are washing, drying, and folding. By starting a wash
stage as soon as the previous wash stage is moved to the dryer, the idle time of the washer is
minimized. Notice that the wash stage takes less time than the dry stage, so the wash stage must
remain idle until the dry stage finishes: the steady state throughput of the pipeline is limited by
the slowest stage in the pipeline. This can be mitigated by breaking up the bottleneck stage into
smaller sub-stages. For those less concerned with laundry-based examples, consider a video
game. The CPU computes the keyboard/mouse input each frame and moves the camera
accordingly, then the GPU takes that information and actually renders the scene; meanwhile, the
CPU has already begun calculating what's going to happen in the next frame.
How Pipelining will done:
In class, we mentioned that interpreting each computer instruction is a four step process: fetching
the instruction, decoding it and reading the register, executing it, and recording the results. Each
instruction may take 4 cycles to complete, but if our throughput is one instruction each cycle,
then we would like to perform, on average, $n$ instructions every $n$ cycles. To accomplish
this, we can split up an instruction's work into the 4 different steps so that other pieces of
hardware work to decode, execute, and record results while the CPU performs the fetch. The
latency to process each instruction is fixed at 4 cycles, so by processing a new instruction every
cycle, after four cycles, one instruction has been completed and three are "in progress" (they're
in the pipeline). After many cycles the steady state throughput approaches one completed
instruction every cycle.
An assembly line in a auto manufacturing plant is another good example of a pipelined process.
There are many steps in the assembly of the car, each of which is assigned a stage in the pipeline.
Typically the depth of these pipelines is very large: cars are pretty complex, so there need to be a
lot of stages in the assembly line. The more stages, the longer it takes to crank the system up to a
steady state. The larger the depth, the more costly it is to turn the system around: A branch
misprediction in an instruction pipeline would be like getting one of the steps wrong in the
assembly line: all the cars affected would have to go back to the beginning of the assembly line
and be processed again.
OnLive Example[Realtime]:
OnLive is a company that allows gamers to play video games in the cloud. The games are run on
one of the company's server farms, and video of the game is sent back to your computer. The
idea is that even the lamest of computers can run the most highly intensive games because all the
computer does is send your joystick input over the internet and display the frames it gets back.
Of course, no one wants to play a game with a noticeably low framerate. We're going to
demonstrate how OnLive could deliver a reasonable experience. For our purposes, we'll assume
that OnLive uses a four step process: the user's computer sends over the input to the server
(10ms), the server tells the game about the user's input and then compresses the resulting game
frame (15ms), the compressed video is sent back to the user (60ms) where it is then
decompressed and displayed (15ms). Note that OnLive doesn't share its data, so these numbers
are contrived.
The latency of this process is 100ms (10+15+60+15). This means that there will always be a
tenth of a second lag from when you perform an action to when you see it affect things on the
screen.
Communication between different parts of a machine is not particularly easy to manage since
often it only occurs in burst situations - that is a huge demand on the communication framework
followed by a period of very little activity. Communication can be sped up by pipelining
however. We do not necessarily have to wait for a message to be delivered before we send
another piece of information. Therefore we can set up a level of pipelining. Often, however, the
rate at which we can send messages is much faster than the time it takes data to go through the
slowest part of our system. Therefore, pipelining only helps to an extent because in the long run
our communication is limited by the slowest part of our system.
Data hazards:
Data hazards occur when instructions that exhibit data dependence modify data in different
stages of a pipeline. Ignoring potential data hazards can result in race conditions (also termed
race hazards). There are three situations in which a data hazard can occur:
read after write (RAW), a true dependency
write after read (WAR), an anti-dependency
write after write (WAW), an output dependency
Consider two instructions i1 and i2, with i1 occurring before i2 in program order.
Read after write (RAW):
(i2 tries to read a source before i1 writes to it) A read after write (RAW) data hazard refers to a
situation where an instruction refers to a result that has not yet been calculated or retrieved. This
can occur because even though an instruction is executed after a prior instruction, the prior
instruction has been processed only partly through the pipeline.
For example:
i1. R2 <- R1 + R3
i2. R4 <- R2 + R3
The first instruction is calculating a value to be saved in register R2, and the second is going to
use this value to compute a result for register R4. However, in a pipeline, when operands are
fetched for the 2nd operation, the results from the first will not yet have been saved, and hence a
data dependency occurs.
A data dependency occurs with instruction i2, as it is dependent on the completion of instruction
i1.
Write after write (WAW):
(i2 tries to write an operand before it is written by i1) A write after write (WAW) data hazard
may occur in a concurrent execution environment.
Example:
For example:
i1. R2 <- R4 + R7
i2. R2 <- R1 + R3
The write back (WB) of i2 must be delayed until i1 finishes executing.
Structural hazards:
A structural hazard occurs when a part of the processor's hardware is needed by two or more
instructions at the same time. A canonical example is a single memory unit that is accessed both
in the fetch stage where an instruction is retrieved from memory, and the memory stage where
data is written and/or read from memory.[3] They can often be resolved by separating the
component into orthogonal units (such as separate caches) or bubbling the pipeline.
Control hazards (branch hazards):
Further information: Branch (computer science)
Branching hazards (also termed control hazards) occur with branches. On many instruction
pipeline microarchitectures, the processor will not know the outcome of the branch when it needs
to insert a new instruction into the pipeline.
Forwarding:
The problem with data hazards, introduced by this sequence of instructions can be solved with a
simple hardware technique called forwarding.
1 2 3 4 5 6 7
ADD R1,R2,R3 IF ID EX MEM WB
SUB R4,R5,R1
IF ID SUB EX MEM WB
AND R6,R1,R7
IF ID AND EX MEM WB
The key insight in forwarding is that the result is not really needed by SUB until after the ADD
actually produces it. The only problem is to make it available for SUB when it needs it.
If the result can be moved from where the ADD produces it (EX/MEM register), to where the
SUB needs it (ALU input latch), then the need for a stall can be avoided.
Using this observation , forwarding works as follows:
The ALU result from the EX/MEM register is always fed back to the ALU input latches.
If the forwarding hardware detects that the previous ALU operation has written the register
corresponding to the source for the current ALU operation, control logic selects the forwarded
result as the ALU input rather than the value read from the register file.
Forwarding of results to the ALU requires the additional of three extra inputs on each ALU
multiplexer and the addtion of three paths to the new inputs.
The paths correspond to a forwarding of:
(a) the ALU output at the end of EX,
(b) the ALU output at the end of MEM, and
(c) the memory output at the end of MEM.

More Related Content

PPT
Pipeline hazard
AJAL A J
 
PPTX
3 Pipelining
fika sweety
 
PDF
Computer SAarchitecture Lecture 6_Pip.pdf
kimhyunwoo24
 
PPTX
Lecture-9 Parallel-processing .pptx
hammadtahsan
 
PPT
12 processor structure and function
Sher Shah Merkhel
 
PDF
COA_Unit-3_slides_Pipeline Processing .pdf
ganeshchettipalli
 
PPT
Performance Enhancement with Pipelining
Aneesh Raveendran
 
DOC
Pipeline Mechanism
Ashik Iqbal
 
Pipeline hazard
AJAL A J
 
3 Pipelining
fika sweety
 
Computer SAarchitecture Lecture 6_Pip.pdf
kimhyunwoo24
 
Lecture-9 Parallel-processing .pptx
hammadtahsan
 
12 processor structure and function
Sher Shah Merkhel
 
COA_Unit-3_slides_Pipeline Processing .pdf
ganeshchettipalli
 
Performance Enhancement with Pipelining
Aneesh Raveendran
 
Pipeline Mechanism
Ashik Iqbal
 

Similar to Pipelining understandingPipelining is running multiple stages of .pdf (20)

PPTX
Computer organisation and architecture .
MalligaarjunanN
 
PDF
Pipeline Computing by S. M. Risalat Hasan Chowdhury
S. M. Risalat Hasan Chowdhury
 
PPT
Pipelining
AJAL A J
 
PPT
Chap2 slides
ashishmulchandani
 
PPTX
Instruction pipeline: Computer Architecture
InteX Research Lab
 
PPTX
Pipelining
Saritha Sri
 
PPTX
Instruction Pipelining
Raihan Mahmud (RAM)
 
PPTX
Pipelining Hazards important points .pptx
AshwinHarikumar2
 
PPT
pipelining
Siddique Ibrahim
 
PPT
Instruction pipelining
Shoaib Commando
 
PPT
Computer architecture pipelining
Mazin Alwaaly
 
PPT
Chapter6 pipelining
Gurpreet Singh
 
PDF
Pipeline and data hazard
Waed Shagareen
 
PPTX
Presentation1(1)
Ruthvik Vaila
 
PPT
chapter6- Pipelining.ppt chaptPipelining
meghaasha6700
 
PPT
Pipelining in computer architecture
Ramakrishna Reddy Bijjam
 
PPT
Pipelining
Shubham Bammi
 
PPT
Ct213 processor design_pipelinehazard
rakeshrakesh2020
 
PDF
The Challenges facing Libraries and Imperative Languages from Massively Paral...
Jason Hearne-McGuiness
 
PPT
12 processor structure and function
Anwal Mirza
 
Computer organisation and architecture .
MalligaarjunanN
 
Pipeline Computing by S. M. Risalat Hasan Chowdhury
S. M. Risalat Hasan Chowdhury
 
Pipelining
AJAL A J
 
Chap2 slides
ashishmulchandani
 
Instruction pipeline: Computer Architecture
InteX Research Lab
 
Pipelining
Saritha Sri
 
Instruction Pipelining
Raihan Mahmud (RAM)
 
Pipelining Hazards important points .pptx
AshwinHarikumar2
 
pipelining
Siddique Ibrahim
 
Instruction pipelining
Shoaib Commando
 
Computer architecture pipelining
Mazin Alwaaly
 
Chapter6 pipelining
Gurpreet Singh
 
Pipeline and data hazard
Waed Shagareen
 
Presentation1(1)
Ruthvik Vaila
 
chapter6- Pipelining.ppt chaptPipelining
meghaasha6700
 
Pipelining in computer architecture
Ramakrishna Reddy Bijjam
 
Pipelining
Shubham Bammi
 
Ct213 processor design_pipelinehazard
rakeshrakesh2020
 
The Challenges facing Libraries and Imperative Languages from Massively Paral...
Jason Hearne-McGuiness
 
12 processor structure and function
Anwal Mirza
 
Ad

More from arasanlethers (20)

PDF
#include SDLSDL.hSDL_Surface Background = NULL; SDL_Surface.pdf
arasanlethers
 
PDF
C, D, and E are wrong and involve random constants. Thisnarrows it d.pdf
arasanlethers
 
PDF
AnswerOogenesis is the process by which ovum mother cells or oogo.pdf
arasanlethers
 
PDF
AnswerB) S. typhimuium gains access to the host by crossing the.pdf
arasanlethers
 
PDF
Answer question1,2,4,5Ion–dipole interactionsAffinity of oxygen .pdf
arasanlethers
 
PDF
A Letter to myself!Hi to myself!Now that I am an Engineer with a.pdf
arasanlethers
 
PDF
Particulars Amount ($) Millons a) Purchase consideratio.pdf
arasanlethers
 
PDF
2 = 14.191,     df = 9,2df = 1.58 ,         P(2 14.191) = .pdf
arasanlethers
 
PDF
while determining the pH the pH of the water is n.pdf
arasanlethers
 
PDF
Quantum Numbers and Atomic Orbitals By solving t.pdf
arasanlethers
 
PDF
They are molecules that are mirror images of each.pdf
arasanlethers
 
PDF
Well u put so many type of compounds here.Generally speakingi) t.pdf
arasanlethers
 
PDF
Ventilation is the process of air going in and out of lungs. Increas.pdf
arasanlethers
 
PDF
A1) A living being or an individual is known as an organism and it i.pdf
arasanlethers
 
PDF
there are laws and regulations that would pertain to an online breac.pdf
arasanlethers
 
PDF
The false statement among the given list is “Territoriality means ho.pdf
arasanlethers
 
PDF
The current article is discussing about the role of SOX4 geneprotei.pdf
arasanlethers
 
PDF
The major similarities between rocks and minerals are as follows1.pdf
arasanlethers
 
PDF
main.cpp #include iostream #include iomanip #include S.pdf
arasanlethers
 
PDF
12). Choose the letter designation that represent homozygous recessi.pdf
arasanlethers
 
#include SDLSDL.hSDL_Surface Background = NULL; SDL_Surface.pdf
arasanlethers
 
C, D, and E are wrong and involve random constants. Thisnarrows it d.pdf
arasanlethers
 
AnswerOogenesis is the process by which ovum mother cells or oogo.pdf
arasanlethers
 
AnswerB) S. typhimuium gains access to the host by crossing the.pdf
arasanlethers
 
Answer question1,2,4,5Ion–dipole interactionsAffinity of oxygen .pdf
arasanlethers
 
A Letter to myself!Hi to myself!Now that I am an Engineer with a.pdf
arasanlethers
 
Particulars Amount ($) Millons a) Purchase consideratio.pdf
arasanlethers
 
2 = 14.191,     df = 9,2df = 1.58 ,         P(2 14.191) = .pdf
arasanlethers
 
while determining the pH the pH of the water is n.pdf
arasanlethers
 
Quantum Numbers and Atomic Orbitals By solving t.pdf
arasanlethers
 
They are molecules that are mirror images of each.pdf
arasanlethers
 
Well u put so many type of compounds here.Generally speakingi) t.pdf
arasanlethers
 
Ventilation is the process of air going in and out of lungs. Increas.pdf
arasanlethers
 
A1) A living being or an individual is known as an organism and it i.pdf
arasanlethers
 
there are laws and regulations that would pertain to an online breac.pdf
arasanlethers
 
The false statement among the given list is “Territoriality means ho.pdf
arasanlethers
 
The current article is discussing about the role of SOX4 geneprotei.pdf
arasanlethers
 
The major similarities between rocks and minerals are as follows1.pdf
arasanlethers
 
main.cpp #include iostream #include iomanip #include S.pdf
arasanlethers
 
12). Choose the letter designation that represent homozygous recessi.pdf
arasanlethers
 
Ad

Recently uploaded (20)

PDF
Biological Classification Class 11th NCERT CBSE NEET.pdf
NehaRohtagi1
 
PDF
Virat Kohli- the Pride of Indian cricket
kushpar147
 
PDF
Health-The-Ultimate-Treasure (1).pdf/8th class science curiosity /samyans edu...
Sandeep Swamy
 
PDF
Review of Related Literature & Studies.pdf
Thelma Villaflores
 
PPTX
Five Point Someone – Chetan Bhagat | Book Summary & Analysis by Bhupesh Kushwaha
Bhupesh Kushwaha
 
PDF
Module 2: Public Health History [Tutorial Slides]
JonathanHallett4
 
PPTX
Cleaning Validation Ppt Pharmaceutical validation
Ms. Ashatai Patil
 
PDF
2.Reshaping-Indias-Political-Map.ppt/pdf/8th class social science Exploring S...
Sandeep Swamy
 
PPTX
A Smarter Way to Think About Choosing a College
Cyndy McDonald
 
PPTX
Python-Application-in-Drug-Design by R D Jawarkar.pptx
Rahul Jawarkar
 
DOCX
SAROCES Action-Plan FOR ARAL PROGRAM IN DEPED
Levenmartlacuna1
 
PPTX
An introduction to Prepositions for beginners.pptx
drsiddhantnagine
 
PDF
BÀI TẬP TEST BỔ TRỢ THEO TỪNG CHỦ ĐỀ CỦA TỪNG UNIT KÈM BÀI TẬP NGHE - TIẾNG A...
Nguyen Thanh Tu Collection
 
PPTX
How to Manage Leads in Odoo 18 CRM - Odoo Slides
Celine George
 
PPTX
BASICS IN COMPUTER APPLICATIONS - UNIT I
suganthim28
 
PDF
What is CFA?? Complete Guide to the Chartered Financial Analyst Program
sp4989653
 
PDF
The-Invisible-Living-World-Beyond-Our-Naked-Eye chapter 2.pdf/8th science cur...
Sandeep Swamy
 
DOCX
Action Plan_ARAL PROGRAM_ STAND ALONE SHS.docx
Levenmartlacuna1
 
PPTX
Tips Management in Odoo 18 POS - Odoo Slides
Celine George
 
PPTX
How to Apply for a Job From Odoo 18 Website
Celine George
 
Biological Classification Class 11th NCERT CBSE NEET.pdf
NehaRohtagi1
 
Virat Kohli- the Pride of Indian cricket
kushpar147
 
Health-The-Ultimate-Treasure (1).pdf/8th class science curiosity /samyans edu...
Sandeep Swamy
 
Review of Related Literature & Studies.pdf
Thelma Villaflores
 
Five Point Someone – Chetan Bhagat | Book Summary & Analysis by Bhupesh Kushwaha
Bhupesh Kushwaha
 
Module 2: Public Health History [Tutorial Slides]
JonathanHallett4
 
Cleaning Validation Ppt Pharmaceutical validation
Ms. Ashatai Patil
 
2.Reshaping-Indias-Political-Map.ppt/pdf/8th class social science Exploring S...
Sandeep Swamy
 
A Smarter Way to Think About Choosing a College
Cyndy McDonald
 
Python-Application-in-Drug-Design by R D Jawarkar.pptx
Rahul Jawarkar
 
SAROCES Action-Plan FOR ARAL PROGRAM IN DEPED
Levenmartlacuna1
 
An introduction to Prepositions for beginners.pptx
drsiddhantnagine
 
BÀI TẬP TEST BỔ TRỢ THEO TỪNG CHỦ ĐỀ CỦA TỪNG UNIT KÈM BÀI TẬP NGHE - TIẾNG A...
Nguyen Thanh Tu Collection
 
How to Manage Leads in Odoo 18 CRM - Odoo Slides
Celine George
 
BASICS IN COMPUTER APPLICATIONS - UNIT I
suganthim28
 
What is CFA?? Complete Guide to the Chartered Financial Analyst Program
sp4989653
 
The-Invisible-Living-World-Beyond-Our-Naked-Eye chapter 2.pdf/8th science cur...
Sandeep Swamy
 
Action Plan_ARAL PROGRAM_ STAND ALONE SHS.docx
Levenmartlacuna1
 
Tips Management in Odoo 18 POS - Odoo Slides
Celine George
 
How to Apply for a Job From Odoo 18 Website
Celine George
 

Pipelining understandingPipelining is running multiple stages of .pdf

  • 1. Pipelining understanding: Pipelining is running multiple stages of the same process in parallel in a way that efficiently uses all the available hardware while respecting the dependencies of each stage upon the previous stages. In the laundry example, the stages are washing, drying, and folding. By starting a wash stage as soon as the previous wash stage is moved to the dryer, the idle time of the washer is minimized. Notice that the wash stage takes less time than the dry stage, so the wash stage must remain idle until the dry stage finishes: the steady state throughput of the pipeline is limited by the slowest stage in the pipeline. This can be mitigated by breaking up the bottleneck stage into smaller sub-stages. For those less concerned with laundry-based examples, consider a video game. The CPU computes the keyboard/mouse input each frame and moves the camera accordingly, then the GPU takes that information and actually renders the scene; meanwhile, the CPU has already begun calculating what's going to happen in the next frame. How Pipelining will done: In class, we mentioned that interpreting each computer instruction is a four step process: fetching the instruction, decoding it and reading the register, executing it, and recording the results. Each instruction may take 4 cycles to complete, but if our throughput is one instruction each cycle, then we would like to perform, on average, $n$ instructions every $n$ cycles. To accomplish this, we can split up an instruction's work into the 4 different steps so that other pieces of hardware work to decode, execute, and record results while the CPU performs the fetch. The latency to process each instruction is fixed at 4 cycles, so by processing a new instruction every cycle, after four cycles, one instruction has been completed and three are "in progress" (they're in the pipeline). After many cycles the steady state throughput approaches one completed instruction every cycle. An assembly line in a auto manufacturing plant is another good example of a pipelined process. There are many steps in the assembly of the car, each of which is assigned a stage in the pipeline. Typically the depth of these pipelines is very large: cars are pretty complex, so there need to be a lot of stages in the assembly line. The more stages, the longer it takes to crank the system up to a steady state. The larger the depth, the more costly it is to turn the system around: A branch misprediction in an instruction pipeline would be like getting one of the steps wrong in the assembly line: all the cars affected would have to go back to the beginning of the assembly line and be processed again. OnLive Example[Realtime]: OnLive is a company that allows gamers to play video games in the cloud. The games are run on one of the company's server farms, and video of the game is sent back to your computer. The idea is that even the lamest of computers can run the most highly intensive games because all the
  • 2. computer does is send your joystick input over the internet and display the frames it gets back. Of course, no one wants to play a game with a noticeably low framerate. We're going to demonstrate how OnLive could deliver a reasonable experience. For our purposes, we'll assume that OnLive uses a four step process: the user's computer sends over the input to the server (10ms), the server tells the game about the user's input and then compresses the resulting game frame (15ms), the compressed video is sent back to the user (60ms) where it is then decompressed and displayed (15ms). Note that OnLive doesn't share its data, so these numbers are contrived. The latency of this process is 100ms (10+15+60+15). This means that there will always be a tenth of a second lag from when you perform an action to when you see it affect things on the screen. Communication between different parts of a machine is not particularly easy to manage since often it only occurs in burst situations - that is a huge demand on the communication framework followed by a period of very little activity. Communication can be sped up by pipelining however. We do not necessarily have to wait for a message to be delivered before we send another piece of information. Therefore we can set up a level of pipelining. Often, however, the rate at which we can send messages is much faster than the time it takes data to go through the slowest part of our system. Therefore, pipelining only helps to an extent because in the long run our communication is limited by the slowest part of our system. Data hazards: Data hazards occur when instructions that exhibit data dependence modify data in different stages of a pipeline. Ignoring potential data hazards can result in race conditions (also termed race hazards). There are three situations in which a data hazard can occur: read after write (RAW), a true dependency write after read (WAR), an anti-dependency write after write (WAW), an output dependency Consider two instructions i1 and i2, with i1 occurring before i2 in program order. Read after write (RAW): (i2 tries to read a source before i1 writes to it) A read after write (RAW) data hazard refers to a situation where an instruction refers to a result that has not yet been calculated or retrieved. This can occur because even though an instruction is executed after a prior instruction, the prior instruction has been processed only partly through the pipeline. For example: i1. R2 <- R1 + R3 i2. R4 <- R2 + R3
  • 3. The first instruction is calculating a value to be saved in register R2, and the second is going to use this value to compute a result for register R4. However, in a pipeline, when operands are fetched for the 2nd operation, the results from the first will not yet have been saved, and hence a data dependency occurs. A data dependency occurs with instruction i2, as it is dependent on the completion of instruction i1. Write after write (WAW): (i2 tries to write an operand before it is written by i1) A write after write (WAW) data hazard may occur in a concurrent execution environment. Example: For example: i1. R2 <- R4 + R7 i2. R2 <- R1 + R3 The write back (WB) of i2 must be delayed until i1 finishes executing. Structural hazards: A structural hazard occurs when a part of the processor's hardware is needed by two or more instructions at the same time. A canonical example is a single memory unit that is accessed both in the fetch stage where an instruction is retrieved from memory, and the memory stage where data is written and/or read from memory.[3] They can often be resolved by separating the component into orthogonal units (such as separate caches) or bubbling the pipeline. Control hazards (branch hazards): Further information: Branch (computer science) Branching hazards (also termed control hazards) occur with branches. On many instruction pipeline microarchitectures, the processor will not know the outcome of the branch when it needs to insert a new instruction into the pipeline. Forwarding: The problem with data hazards, introduced by this sequence of instructions can be solved with a simple hardware technique called forwarding. 1 2 3 4 5 6 7 ADD R1,R2,R3 IF ID EX MEM WB SUB R4,R5,R1 IF ID SUB EX MEM WB AND R6,R1,R7 IF ID AND EX MEM WB The key insight in forwarding is that the result is not really needed by SUB until after the ADD
  • 4. actually produces it. The only problem is to make it available for SUB when it needs it. If the result can be moved from where the ADD produces it (EX/MEM register), to where the SUB needs it (ALU input latch), then the need for a stall can be avoided. Using this observation , forwarding works as follows: The ALU result from the EX/MEM register is always fed back to the ALU input latches. If the forwarding hardware detects that the previous ALU operation has written the register corresponding to the source for the current ALU operation, control logic selects the forwarded result as the ALU input rather than the value read from the register file. Forwarding of results to the ALU requires the additional of three extra inputs on each ALU multiplexer and the addtion of three paths to the new inputs. The paths correspond to a forwarding of: (a) the ALU output at the end of EX, (b) the ALU output at the end of MEM, and (c) the memory output at the end of MEM. Solution Pipelining understanding: Pipelining is running multiple stages of the same process in parallel in a way that efficiently uses all the available hardware while respecting the dependencies of each stage upon the previous stages. In the laundry example, the stages are washing, drying, and folding. By starting a wash stage as soon as the previous wash stage is moved to the dryer, the idle time of the washer is minimized. Notice that the wash stage takes less time than the dry stage, so the wash stage must remain idle until the dry stage finishes: the steady state throughput of the pipeline is limited by the slowest stage in the pipeline. This can be mitigated by breaking up the bottleneck stage into smaller sub-stages. For those less concerned with laundry-based examples, consider a video game. The CPU computes the keyboard/mouse input each frame and moves the camera accordingly, then the GPU takes that information and actually renders the scene; meanwhile, the CPU has already begun calculating what's going to happen in the next frame. How Pipelining will done: In class, we mentioned that interpreting each computer instruction is a four step process: fetching the instruction, decoding it and reading the register, executing it, and recording the results. Each instruction may take 4 cycles to complete, but if our throughput is one instruction each cycle, then we would like to perform, on average, $n$ instructions every $n$ cycles. To accomplish this, we can split up an instruction's work into the 4 different steps so that other pieces of hardware work to decode, execute, and record results while the CPU performs the fetch. The
  • 5. latency to process each instruction is fixed at 4 cycles, so by processing a new instruction every cycle, after four cycles, one instruction has been completed and three are "in progress" (they're in the pipeline). After many cycles the steady state throughput approaches one completed instruction every cycle. An assembly line in a auto manufacturing plant is another good example of a pipelined process. There are many steps in the assembly of the car, each of which is assigned a stage in the pipeline. Typically the depth of these pipelines is very large: cars are pretty complex, so there need to be a lot of stages in the assembly line. The more stages, the longer it takes to crank the system up to a steady state. The larger the depth, the more costly it is to turn the system around: A branch misprediction in an instruction pipeline would be like getting one of the steps wrong in the assembly line: all the cars affected would have to go back to the beginning of the assembly line and be processed again. OnLive Example[Realtime]: OnLive is a company that allows gamers to play video games in the cloud. The games are run on one of the company's server farms, and video of the game is sent back to your computer. The idea is that even the lamest of computers can run the most highly intensive games because all the computer does is send your joystick input over the internet and display the frames it gets back. Of course, no one wants to play a game with a noticeably low framerate. We're going to demonstrate how OnLive could deliver a reasonable experience. For our purposes, we'll assume that OnLive uses a four step process: the user's computer sends over the input to the server (10ms), the server tells the game about the user's input and then compresses the resulting game frame (15ms), the compressed video is sent back to the user (60ms) where it is then decompressed and displayed (15ms). Note that OnLive doesn't share its data, so these numbers are contrived. The latency of this process is 100ms (10+15+60+15). This means that there will always be a tenth of a second lag from when you perform an action to when you see it affect things on the screen. Communication between different parts of a machine is not particularly easy to manage since often it only occurs in burst situations - that is a huge demand on the communication framework followed by a period of very little activity. Communication can be sped up by pipelining however. We do not necessarily have to wait for a message to be delivered before we send another piece of information. Therefore we can set up a level of pipelining. Often, however, the rate at which we can send messages is much faster than the time it takes data to go through the slowest part of our system. Therefore, pipelining only helps to an extent because in the long run our communication is limited by the slowest part of our system.
  • 6. Data hazards: Data hazards occur when instructions that exhibit data dependence modify data in different stages of a pipeline. Ignoring potential data hazards can result in race conditions (also termed race hazards). There are three situations in which a data hazard can occur: read after write (RAW), a true dependency write after read (WAR), an anti-dependency write after write (WAW), an output dependency Consider two instructions i1 and i2, with i1 occurring before i2 in program order. Read after write (RAW): (i2 tries to read a source before i1 writes to it) A read after write (RAW) data hazard refers to a situation where an instruction refers to a result that has not yet been calculated or retrieved. This can occur because even though an instruction is executed after a prior instruction, the prior instruction has been processed only partly through the pipeline. For example: i1. R2 <- R1 + R3 i2. R4 <- R2 + R3 The first instruction is calculating a value to be saved in register R2, and the second is going to use this value to compute a result for register R4. However, in a pipeline, when operands are fetched for the 2nd operation, the results from the first will not yet have been saved, and hence a data dependency occurs. A data dependency occurs with instruction i2, as it is dependent on the completion of instruction i1. Write after write (WAW): (i2 tries to write an operand before it is written by i1) A write after write (WAW) data hazard may occur in a concurrent execution environment. Example: For example: i1. R2 <- R4 + R7 i2. R2 <- R1 + R3 The write back (WB) of i2 must be delayed until i1 finishes executing. Structural hazards: A structural hazard occurs when a part of the processor's hardware is needed by two or more instructions at the same time. A canonical example is a single memory unit that is accessed both in the fetch stage where an instruction is retrieved from memory, and the memory stage where data is written and/or read from memory.[3] They can often be resolved by separating the component into orthogonal units (such as separate caches) or bubbling the pipeline.
  • 7. Control hazards (branch hazards): Further information: Branch (computer science) Branching hazards (also termed control hazards) occur with branches. On many instruction pipeline microarchitectures, the processor will not know the outcome of the branch when it needs to insert a new instruction into the pipeline. Forwarding: The problem with data hazards, introduced by this sequence of instructions can be solved with a simple hardware technique called forwarding. 1 2 3 4 5 6 7 ADD R1,R2,R3 IF ID EX MEM WB SUB R4,R5,R1 IF ID SUB EX MEM WB AND R6,R1,R7 IF ID AND EX MEM WB The key insight in forwarding is that the result is not really needed by SUB until after the ADD actually produces it. The only problem is to make it available for SUB when it needs it. If the result can be moved from where the ADD produces it (EX/MEM register), to where the SUB needs it (ALU input latch), then the need for a stall can be avoided. Using this observation , forwarding works as follows: The ALU result from the EX/MEM register is always fed back to the ALU input latches. If the forwarding hardware detects that the previous ALU operation has written the register corresponding to the source for the current ALU operation, control logic selects the forwarded result as the ALU input rather than the value read from the register file. Forwarding of results to the ALU requires the additional of three extra inputs on each ALU multiplexer and the addtion of three paths to the new inputs. The paths correspond to a forwarding of: (a) the ALU output at the end of EX, (b) the ALU output at the end of MEM, and (c) the memory output at the end of MEM.