Assembly p1

 A technique used in advanced microprocessors where the
microprocessor begins executing a second instruction
before the first has been completed.
- A Pipeline is a series of stages, where some work is done at
each stage. The work is not finished until it has passed
through all stages.
 With pipelining, the computer architecture allows the next
instructions to be fetched while the processor is
performing arithmetic operations, holding them in a buffer
close to the processor until each instruction operation can
performed.

 The pipeline is divided into segments and each
segment can execute it operation concurrently with
the other segments. Once a segment completes an
operations, it passes the result to the next segment in
the pipeline and fetches the next operations from the
preceding segment.

Four Pipelined Instructions
IF
IF
IF
IF
ID
ID
ID
ID
EX
EX
EX
EX M
M
M
M
W
W
W
W
5
1
1
1

Instructions Fetch
 The instruction Fetch (IF) stage is responsible for obtaining
the requested instruction from memory. The instruction
and the program counter (which is incremented to the next
instruction) are stored in the IF/ID pipeline register as
temporary storage so that may be used in the next stage at
the start of the next clock cycle.

Instruction Decode
 The Instruction Decode (ID) stage is responsible for
decoding the instruction and sending out the various
control lines to the other parts of the processor. The
instruction is sent to the control unit where it is decoded
and the registers are fetched from the register file.

Execution
 The Execution (EX) stage is where any calculations are
performed. The main component in this stage is the ALU.
The ALU is made up of arithmetic, logic and capabilities.

Memory and IO
 The Memory and IO (MEM) stage is responsible for storing
and loading values to and from memory. It also responsible
for input or output from the processor. If the current
instruction is not of Memory or IO type than the result
from the ALU is passed through to the write back stage.

Write Back
 The Write Back (WB) stage is responsible for writing
the result of a calculation, memory access or input into
the register file.

Decode
Instruction and
Calculate Effective
Address
Fetch Instruction
From Memory
Branch ?
Update PC
Empty Pipe
Interrupt
Handling
Fetch Operand
From memory
Execute
Instruction
Interrupt
YES
NO
YES NO

INTRODUCTION
Pipelining is technique of decomposing a
sequential process into suboperation, with each
subprocess being executed in a special dedicated
segment that operates concurrently with all other
segments.
The name “pipeline” implies a flows of
information analogous to an industrial assembly
line.

The name “pipeline” implies a flow of
information analogous to an industrial assembly
line.
It is characteristic of pipelines that several
computation can be in progress in distinct at the
same time.
Each subtask can be processed independently
on a different machine.
The pipelining design provides a way to start a
new task before an old one has been completed.

F
1
E
1
F
2
E
2
F
3
E
3
I1 I2 I3
(a) Sequential execution
Instruction
fetch
unit
Exelution
unit
Interstage buffer
B1
(b) Hardware organization
Time
F1 E1
F2 E2
F3 E3
I1
I2
I3
Instruction
(c) Pipelined execution
Clock cycle 1 2 3 4
Time
Fetch + Execution

 pipelining processing:
 Perform arithmetic operation (Ai*Bi)+(Ci*Di) with a
stream of number. A specify pipeline configuration to
carry out the task. Register in the pipeline for i=1
through 6.
It consist of seven registers that receive new data with
every clock pulse ,two multipliers and one adder circuits .

R1 R2 R3 R4
MULTIPLIER MULTIPLIER
R5 R6
ADDER
R7
Stage 1
Stage 2
Stage 3
Ai Bi Ci Di

 The performance gain from using pipelining occurs
because we can start the execution of a new
instruction each clock cycle. In a real implementation
this is not always possible.
 Another important note is that in a pipelined
processor, a particular instruction still takes at least as
long to execute as non-pipelined.
 Pipeline hazards prevent the execution of the next
instruction during the appropriate clock cycle.

 There are three types of hazards in a pipeline, they are
as follows:
 Structural Hazards: are created when the data path hardware
in the pipeline cannot support all of the overlapped
instructions in the pipeline.
 Data Hazards: When there is an instruction in the pipeline
that affects the result of another instruction in the pipeline.
 Control Hazards: The PC causes these due to the pipelining
of branches and other instructions that change the PC.

 Structural hazards result from the CPU data path
not having resources to service all the required
overlapping resources.
 Suppose a processor can only read and write from
the registers in one clock cycle. This would cause a
problem during the ID and WB stages.
 Assume that there are not separate instruction and
data caches, and only one memory access can occur
during one clock cycle. A hazard would be caused
during the IF and MEM cycles.

 A structural hazard is dealt with by inserting a stall or
pipeline bubble into the pipeline. This means that for that
clock cycle, nothing happens for that instruction. This
effectively “slides” that instruction, and subsequent
instructions, by one clock cycle.
 This effectively increases the average CPI.
 EX: Assume that you need to compare two processors, one
with a structural hazard that occurs 40% for the time,
causing a stall. Assume that the processor with the hazard
has a clock rate 1.05 times faster than the processor without
the hazard. How fast is the processor with the hazard
compared to the one without the hazard?

 We can see that even though the clock speed of the
processor with the hazard is a little faster, the
speedup is still less than 1.
 Therefore the hazard has quite an effect on the
performance.
 Sometimes computer architects will opt to design a
processor that exhibits a structural hazard. Why?
• .

 We haven’t looked at assembly programming in
detail at this point.
 Consider the following operations:
DADD R1, R2, R3
DSUB R4, R1, R5
AND R6, R1, R7
OR R8, R1, R9
XOR R10, R1, R11

Pipeline Registers
What are the problems?

 In this trivial example, we cannot expect the programmer to
reorder his/her operations. Assuming this is the only code we
want to execute.
 Data forwarding can be used to solve this problem.
 To implement data forwarding we need to bypass the
pipeline register flow:
 instruction depends on the write of a previous instruction.

 It is easy to see how data forwarding can be used by
drawing out the pipelined execution of each
instruction.
 Now consider the following instructions:
DADD R1, R2, R3
LD R4, O(R1)
SD R4, 12(R1)

ENGR9861 Winter 2007 RV
 Can data forwarding prevent all data hazards?
 NO!
 The following operations will still cause a data
hazard. This happens because the further down the
pipeline we get, the less we can use forwarding.
LD R1, O(R2)
DSUB R4, R1, R5
AND R6, R1, R7
OR R8, R1, R9

 We can avoid the hazard by using a pipeline
interlock.
 The pipeline interlock will detect when data
forwarding will not be able to get the data to the
next instruction in time.
 A stall is introduced until the instruction can get
the appropriate data from the previous instruction.

 Control hazards are caused by branches in the
code.
 During the IF stage remember that the PC is
incremented by 4 in preparation for the next IF
cycle of the next instruction.
 What happens if there is a branch performed and
we aren’t simply incrementing the PC by 4.
 The easiest way to deal with the occurrence of a
branch is to perform the IF stage again once the
branch occurs.

 These following solutions assume that we are
dealing with static branches. Meaning that the
actions taken during a branch do not change.
 We already saw the first example, we stall the
pipeline until the branch is resolved (in our case we
repeated the IF stage until the branch resolved and
modified the PC)
 The next two examples will always make an
assumption about the branch instruction.

 What if we treat every branch as “not taken”
remember that not only do we read the registers
during ID, but we also perform an equality test in
case we need to branch or not.
 We can improve performance by assuming that the
branch will not be taken.
 What in this case we can simply load in the next
instruction (PC+4) can continue. The complexity
arises when the branch evaluates and we end up
needing to actually take the branch.

 The “branch-not taken” scheme is the same as performing
the IF stage a second time in our 5 stage pipeline if the
branch is taken.
 If not there is no performance degradation.
 The “branch taken” scheme is no benefit in our case because
we evaluate the branch target address in the ID stage.
 The fourth method for dealing with a control hazard is to
implement a “delayed” branch scheme.
 In this scheme an instruction is inserted into the pipeline
that is useful and not dependent on whether the branch is
taken or not. It is the job of the compiler to determine the
delayed branch instruction.

 Sometimes operations require more than one clock
cycle to complete. Examples are:
 Floating Point Multiply
 Floating Point Divide
 Floating Point Add
 We can assume that there is hardware available on
the processor for performing the operations.
 Assume that the FP Mul and Add are fully
pipelined, and the divide is un-pipelined.

 The multiplier and the divider are fully pipelined.
The divider is not pipelined at all.
 Take a look at figure A.34 for a good example of
how pipelining will function in the case of longer
instruction execution. The author assumes a single
floating point register port.
 Structural hazards are avoided in the ID stage by
assigning a memory bit in a shift register. Incoming
instructions can then check to see if they should
stall.

 Data Dependence:
 Instruction i produces a result the instruction j will
use or instruction i is data dependent on instruction
j and vice versa.
 Name Dependence:
 Occurs when two instructions use the same register
and memory location. But there is no flow of data
between the instructions. Instruction order must be
preserved.

 Types of data hazards:
 RAW: read after write
 WAW: write after write
 WAR: write after read
 We have already seen a RAW hazard. WAW hazards
occur due to output dependence.
 WAR hazards do not usually occur because of the
amount of time between the read cycle and write
cycle in a pipeline.

39
Read After Write (RAW)
InstrJ tries to read operand before InstrI writes it
• Caused by a “Dependence” (in compiler nomenclature).
This hazard results from an actual need for
communication.
Execution Order is:
InstrI
InstrJ
I: add r1,r2,r3
J: sub r4,r1,r3

Write After Read (WAR)
InstrJ tries to write operand before InstrI reads i
– Gets wrong operand
– Called an “anti-dependence” by compiler writers.
This results from reuse of the name “r1”.
• Can’t happen in MIPS 5 stage pipeline because:
– All instructions take 5 stages, and
– Reads are always in stage 2, and
– Writes are always in stage 5
Execution Order is:
InstrI
InstrJ
I: sub r4,r1,r3
J: add r1,r2,r3
K: mul r6,r1,r7

Write After Write (WAW)
InstrJ tries to write operand before InstrI writes it
– Leaves wrong result ( InstrI not InstrJ )
• Called an “output dependence” by compiler writers
This also results from the reuse of name “r1”.
• Can’t happen in MIPS 5 stage pipeline because:
– All instructions take 5 stages, and
– Writes are always in stage 5
• Will see WAR and WAW in later more complicated pipes
Execution Order is:
InstrI
InstrJ
I: sub r1,r4,r3
J: add r1,r2,r3
K: mul r6,r1,r7

Assembly p1

More Related Content

What's hot (20)

Viewers also liked (6)

Similar to Assembly p1 (20)

Recently uploaded (20)

Assembly p1