unit-5.pptvshvshshhshsjjsjshhshshshhshsj

• Code optimization is next phase after intermediate code
generation.
• Code optimization can be done at two levels. Machine
independent and Machine dependent code optimization.
• A graph representation of intermediate code is helpful for
discussing how to generate optimized code.
• Code generation benefits from this context.
• We can do a better job of register allocation if we know how
values are defined and used.
• We can do a better job of instruction selection by looking at
sequences of three-address statements transformations on flow
graphs that turn the original intermediate code into "optimized"
intermediate code from which better target code can be
generated.
• The "optimized" intermediate code is turned into machine code
using the code-generation techniques

The representation is constructed as follows:
1. Partition the intermediate code into basic blocks, which are maximal sequences
of consecutive three-address instructions with the properties that
(a)The flow of control can only enter the basic block through the first instruction
in the block. That is, there are no jumps into the middle of the block.
(b) Control will leave the block without halting or branching, except
possibly at the last instruction in the block.
2. The basic blocks become the nodes of a flow graph, whose edges indicate
which blocks can follow which other blocks.
•We begin a new basic block with the first instruction and keep adding
instructions until we meet either a jump, a conditional jump, or a label on the
following instruction.
Basic blocks and flow graphs

Algorithm : Partitioning three-address instructions into basic
blocks.
INPUT: A sequence of three-address instructions.
OUTPUT: A list of the basic blocks for that sequence in which each
instruction is assigned to exactly one basic block.
METHOD: First, we determine those instructions in the
intermediate code that are leaders, that is, the first instructions in
some basic block.
The instruction just past the end of the intermediate program is not
included as a leader.

rules for finding leaders
1. The first three-address instruction in the intermediate code is a
leader.
2. Any instruction that is the target of a conditional or unconditional
jump is a leader.
3. Any instruction that immediately follows a conditional or
unconditional jump is a leader.
 Then, for each leader, its basic block consists of itself and all
instructions up to but not including the next leader or the end of the
intermediate program.

Intermediate code to set a 10*10
matrix to an identity matrix
• In generating the intermediate
code, we have assumed that the
real-valued array elements take 8
bytes each, and that the matrix a is
stored in row-major form.

Flow Graph
•Once an intermediate-code program is partitioned into basic
blocks, we represent the flow of control between them by a flow
graph.
•The nodes of the flow graph are the basic blocks.
•There is an edge from block B to block C if and only if it is
possible for the first instruction in block C to immediately follow
the last instruction in block B.
•There are two ways that such an edge could be justified:
1.There is a conditional or unconditional jump from the end of B to
the beginning of C.
2. C immediately follows B in the original order of the three-
address instructions, and B does not end in an unconditional jump.
•We say that B is a predecessor of C, and C is a successor of B.

Flow graph based on
Basic Blocks

• The entry points is to basic block B1, since B1 contains the first
instruction of the program.
• The only successor of B1 is B2, because B1 does not end in an
unconditional jump, and the leader of B2 immediately follows
the end of B1.
• Block B3 has two successors. One is itself, because the leader of
B3, instruction 3, is the target of the conditional jump at the end
of B3, instruction 9.
• The other successor is B4, because control can fall through the
conditional jump at the end of B3 and next enter the leader of
B4.
• Only B6 points to the exit of the flow graph, since the only way
to get to code that follows the program from which we
constructed the flow graph is to fall through the conditional
jump that ends B6.

Representation of Flow Graphs
•Flow graphs, being quite ordinary graphs, can be represented by
any of the data structures appropriate for graphs.
•The content of nodes (basic blocks) need their own
representation.
•We might represent the content of a node by a pointer to the
leader in the array of three-address instructions, together with a
count of the number of instructions or a second pointer to the last
instruction.
•Hence it is likely to use Linked Lists for each basic blocks.

Next-use information
• Knowing when the value of a variable will be used next is
essential for generating good code.
• If the value of a variable that is currently in a register will never
be referenced subsequently, then that register can be assigned to
another variable.
• Suppose three-address statement i assigns a value to x.
• If statement j has x as an operand, and control can flow from
statement i to j along a path that has no intervening assignments to
x, then we say statement j uses the value of x computed at
statement i.
• We further say that x is live at statement i.

liveness and next-use information
• We wish to determine for each three address statement i: x=y+z what
the next uses of x, y and z are.
Algorithm:
1. Attach to statement i the information currently found in the symbol table
regarding the next use and liveness of x, y, and z.
2. In the symbol table, set x to "not live" and "no next use.“
3. In the symbol table, set y and z to "live" and the next uses of y and z to i.

Loops
•Since virtually every program spends most of its time in
executing its loops, it is especially important for a compiler to
generate good code for loops.
•Many code transformations depend upon the identification of
"loops" in a flow graph.
•We say that a set of nodes L in a flow graph is a loop if
1.There is a node in L called the loop entry with the property that
no other node in L has a predecessor outside L.
That is, every path from the entry of the entire flow
graph to any node in L goes through the loop entry.
2. Every node in L has a nonempty path, completely within L, to
the entry of L.
According to the above flow graph there are three loops
1. B3 by itself
2. B6 by itself
3. {B2,B3,B4}

Optimization of Basic Blocks
•We can often obtain a substantial improvement in the running
time of code merely by performing
local optimization within each basic block by itself .
global optimization, which looks at how information flows
among the basic blocks of a program.
•Many important techniques for local optimization begin by
transforming a basic block into a DAG (directed acyclic graph)

DAG representation of basic blocks
We construct a DAG for a basic block as follows:
•There is a node in the DAG for each of the initial values of the variables
appearing in the basic block.
•There is a node N associated with each statement s within the block. The
children of N are those nodes corresponding to statements that are the
last definitions, prior to s, of the operands used by s.
•Node N is labeled by the operator applied at s, and also attached to N is
the list of variables for which it is the last definition within the block.
•Certain nodes are designated output nodes. These are the nodes whose
variables are live on exit from the block.

Code improving transformations
• We can eliminate local common subexpressions, that is, instructions
that compute a value that has already been computed.
• We can eliminate dead code, that is, instructions that compute a
value that is never used.
• We can reorder statements that do not depend on one another; such
reordering may reduce the time a temporary value needs to be
preserved in a register.
• We can apply algebraic laws to reorder operands of three-address
instructions, and sometimes thereby simplify the computation.

DAG for basic block
Since there are only three non leaf nodes in the DAG, the basic
block contains only three statements as
a=b+c
d=a-d
c=d+c
If b is not live on exit from the block
then no need to compute that variable
i.e

array accesses in a DAG
• An assignment from an array, like x = a [i], is represented by
creating a node with operator =[] and two children representing
the initial value of the array, a0 in this case, and the index i.
Variable x becomes a label of this new node.
• An assignment to an array, like a [j] = y, is represented by a new
node with operator []= and three children representing a0, j and y.
There is no variable labeling this node. What is different is that the
creation of this node kills all currently constructed nodes whose
value depends on a0. A node that has been killed cannot receive
any more labels; that is, it cannot become a common
subexpression.

DAG for a sequence of array assignments

Dead Code Elimination
We delete from a DAG any root (node with no ancestors) that has no
live variables attached.
In the previous figure a & b are live but c and e are not, we can
immediately remove the root labelled e . Then the node c becomes
a root and can be removed. The roots labelled a &b remain , since
they each have live variables attached

Use of Algebraic Identities
X+0=0=x=x x-0=x
X*1=1*x=x x/1=x
This should be done to eliminate computations
from a basic block.
Local Reduction in Strength
Replacing a more expensive operator
by a cheaper one.
x2 x*x
2*x x+x
x/2 x*0.5

Pointer Assignments & Procedure calls
X=*p
*q=y

Rules for reconstructing the basic block from a
DAG
• The order of instructions must respect the order of nodes in the DAG.
That is, we cannot compute a node's value until we have computed a
value for each of its children.
• Assignments to an array must follow all previous assignments to, or
evaluations from, the same array, according to the order of these
instructions in the original basic block.
• Evaluations of array elements must follow any previous (according to
the original block) assignments to the same array. The only permutation
allowed is that two evaluations from the same array may be done in
either order, as long as neither crosses over an assignment to that array.
• Any use of a variable must follow all previous (according to the original
block) procedure calls or indirect assignments through a pointer.
• Any procedure call or indirect assignment through a pointer must follow
all previous (according to the original block) evaluations of any variable.
Reassembling basic blocks from DAGs

A Simple Code Generator
• Generates target code for a sequence of 3-address statements
• For each operator in a statement, there is a corresponding target
language operator.
Register & Address descriptors:
Register descriptor: It keeps track of what is currently in each register.
Intially all registers are empty.
Address descriptor: It keeps track of the location where the current
value of the name can be found.
Location may be a register, a stack location or memory address

principal uses of registers
• In most machine architectures, some or all of the operands of an
operation must be in registers in order to perform the operation.
• Registers make good temporaries - places to hold the result of a
subexpression while a larger expression is being evaluated, or more
generally, a place to hold a variable that is used only within a single
basic block.
• Registers are often used to help with run-time storage management,
for example, to manage the run-time stack, including the
maintenance of stack pointers and possibly the top elements of the
stack itself.

A code Generation Algorithm
For each three- address statement of the form x=y op z,
1.Invoke a function getreg to determine the location L, where result of y op
z should be stored.
2.Consult address descriptor for y to determine y1
, the current location of y.
If y is not already in L,
Generate mov y1
, L
3.Generate the instruction op z1
, L update the address
descriptor of x to indicate that x is in L. If x is a register update
its descriptor to indicate that it contains the value of x.
4.If y and z have no next uses and not live on exit, update the
descriptors to remove y and z.

Example
d=(a-b)+(a-c)+ (a-c)
• Three address statements
t1=a-b
t2=a-c
t3=t1+t2
d=t3+t2

Example
Statements Code Generator Register Descriptor Address Descriptor
t1=a-b Mov a, Ro
Sub b,Ro
Registers are empty
Ro contains t1
t1 in Ro
t2=a-c Mov a,R1
Sub c,R1
Ro contains t1
R1 contains t2
t1 in Ro
t2 in R1
t3=t1+t2 Add R1, Ro Ro contains t3
R1 contains t2
t2 in R1
t3 in Ro
d=t3+t2 Add R1,Ro
Mpv Ro, d
Ro contains d d in Ro
d in Ro and memory

Descriptors for data structure
• For each available register, a register descriptor keeps track of the
variable names whose current value is in that register. Since we
shall use only those registers that are available for local use within
a basic block, we assume that initially, all register descriptors are
empty. As the code generation progresses, each register will hold
the value of zero or more names.
• For each program variable, an address descriptor keeps track of
the location or locations where the current value of that variable
can be found. The location might be a register, a memory address,
a stack location, or some set of more than one of these. The
information can be stored in the symbol-table entry for that
variable name.

Machine Instructions for Operations
• Use getReg(x = y + z) to select registers for x, y, and z. Call these Rx, Ry
and Rz.
• If y is not in Ry (according to the register descriptor for Ry), then issue
an instruction LD Ry, y', where y' is one of the memory locations for y
(according to the address descriptor for y).
• Similarly, if z is not in Rz, issue and instruction LD Rz, z', where z' is a
location for x .
• Issue the instruction ADD Rx , Ry, Rz.

Rules for updating the register and address descriptors
• For the instruction LD R, x
• Change the register descriptor for register R so it holds only x.
• Change the address descriptor for x by adding register R as an additional
location.
• For the instruction ST x, R, change the address descriptor for x to include
its own memory location.
• For an operation such as ADD Rx, Ry, Rz implementing a three-address
instruction x = y + x
• Change the register descriptor for Rx so that it holds only x.
• Change the address descriptor for x so that its only location is Rx. Note
that the memory location for x is not now in the address descriptor for x.
• Remove Rx from the address descriptor of any variable other than x.
• When we process a copy statement x = y, after generating the load for y
into register Ry, if needed, and after managing descriptors as for all load
statements (per rule I):
• Add x to the register descriptor for Ry.
• Change the address descriptor for x so that its only location is Ry .

Characteristic of peephole optimizations
• Redundant-instruction elimination(loads & Stores)
• Eliminating Unreachable code
• Flow-of-control optimizations
• Algebraic simplifications & Reduction in Strength
• Use of machine idioms

Redundant-instruction elimination
• LD a, R0
ST R0, a
• if debug == 1 goto L1
goto L2
L I : print debugging information
L2:

Eliminating Unreachable Code
• if debug == 1 goto L1
goto L2
L I : print debugging information
L2:
If debug!=1 goto L2
Print debugging information
L2:

Flow-of-control optimizations
simple intermediate code-generation algorithms frequently produce
jumps to jumps, jumps to conditional jumps, conditional jumps to jumps.
goto L1
...
Ll: goto L2
Can be replaced by:
goto L2
...
Ll: goto L2
if a<b goto L1
...
Ll: goto L2
Can be replaced by:
if a<b goto L2
...
Ll: goto L2

Algebraic simplifications& Reduction in
strength
• x=x+0
• x=x*1

Use of Machine Idioms
• The target machine may have hardware instructions to implement
certain specific operations efficiently.
• Ex:Some machines have auto increment and auto decrement
addressing modes.
• Hence these add or subtract one from an operand before or after
using its value.
• The use of these modes greatly improves the quality of code when
pushing or popping a stack

Machine –Independent Optimization
• Elimination of unnecessary instructions in object code , or the
replacement of one sequence of instructions by a faster sequence of
instructions that does the same thing is called “code improvement”
or “code optimization”.
• Local code optimization (code improvement with in a basic block)
• Global code optimization where the improvements taken into
account across basic blocks .
• Most global code optimization are based on data- flow analyses
which are algorithms to gather information about a program. The
results of data-flow analyses all have the same form: for each
instruction in the program specify some property that must hold
every time that instruction is executed.

Principal sources of code Optimization
• A compiler must preserve the semantics of the original program.
• A compiler knows only how to apply relatively low level
transformations , using general facts such as algebraic identities like
i=i+0 so that performance of such operations leads to the same
result.

Causes of Redundancy
• There are many redundant operations in a program.
• Some times redundancy is available at the source level.
• A programmer may find it more direct and convenient to recalculate
some result, leaving it to the compiler to recognize that only one such
calculation is necessary.
• Redundancy is a side effect of having written in the program in a high
level language.
ex : In C/C++ where pointer arithmetic is allowed
referring the elements of an array or fields in a structure using
a[i][j] or x->s1.
As a program is compiled, each of these expands into a number of low
level arithmetic operations such as the computation of location of (i,j)
th element of an matrix. Programmers are not aware of all these and
cannot eliminate the redundancies.

unit-5.pptvshvshshhshsjjsjshhshshshhshsj

Semantics-Preserving Transformations
There are a number of ways in which a compiler can improve a program
without changing the function it computes. Common-subexpression
elimination, copy propagation, dead-code elimination, and constant folding
are common examples of such function-preserving (or semantics-preserving)
transformations; we shall consider each in turn.

Global Common-subexpression
elimination
For example, block B5 shown in Fig. 9.4(a) recalculates 4 * i and 4* j, although none of these
calculations were requested explicitly by the programmer.

Global Common-subexpression
elimination
After local common subexpressions are
eliminated, B5 still evaluates 4*i and 4* j, as
shown in Fig. 9.4(b). Both are common
subexpressions; in particular, the three
statements

Copy Propagation
In order to eliminate the common subexpression from
the statement c = d+e in Fig. 9.6(a), we must use a new
variable t to hold the value of d + e. The value of
variable t, instead of that of the expression d + e, is
assigned to c in Fig. 9.6(b). Since control may reach c =
d+e either after the assignment to a or after the
assignment to b, it would be incorrect to replace c =
d+e by either c = a or by c = b.
The idea behind the copy-propagation transformation
is to use v for u, wherever possible after the copy
statement u = v. For example, the assignment x = t3 in
block B5 of Fig. 9.5 is a copy. Copy propagation
applied to B$ yields the code in Fig. 9.7. This change
may not appear to be an improvement, but, as we shall
see in Section 9.1.6, it gives us the opportunity to
eliminate the assignment to x.

Dead-Code Elimination
• A variable is live at a point in a program if its value can be used
subsequently; otherwise, it is dead at that point. A related idea is
dead (or useless) code — statements that compute values that never
get used.
Suppose debug is set to TRUE or FALSE at various points in the program, and used in statements l
i f (debug) p r i n t . . .
It may be possible for the compiler to deduce that each time the program reaches this statement, the value
of debug is FALSE. Usually, it is because there is one particular statement
debug = FALSE
that must be the last assignment to debug prior to any tests of the value of debug, no matter what sequence
of branches the program actually takes. If copy propagation replaces debug by FALSE, then the print
statement is dead because it cannot be reached. We can eliminate both the test and the print operation from
the object code.

Constant Folding
• At compile time deducing the value of an expression is a constant and using
the constant instead is known as constant folding.
• One advantage of copy propagation is that it often turns the copy statement
into dead code. For example, copy propagation followed by dead-code
elimination removes the assignment to x and transforms the code in Fig 9.7
into
a[t2] = t5
a[t4] = t3
goto B2
• This code is a further improvement of block B5 in Fig. 9.5.

Code Motion
• Loops are a very important place for optimizations, especially the inner loops where
programs tend to spend the bulk of their time. The running time of a program may be
improved if we decrease the number of instructions in an inner loop, even if we increase
the amount of code outside that loop.
• An important modification that decreases the amount of code in a loop is code
motion. This transformation takes an expression that yields the same result independent of
the number of times a loop is executed (a loop-invariant computation) and evaluates the
expression before the loop. Note that the notion "before the loop" assumes the existence
of an entry for the loop, that is, one basic block to which all jumps from outside the loop go
while (i <= limit-2) /* statement does not change limit */
Code motion will result in the equivalent code
t = limit-2
while (i <= t) /* statement does not change limit or t */

Induction variables and Reduction in strength
• A variable x is said to be an. "induction variable" if there is a positive or
negative constant c such that each time x is assigned, its value increases by
c.
• i and tl are induction variables in the loop containing B2 of Fig. 9.5. Induction
variables can be computed with a single increment (addition or subtraction)
per loop iteration. The transformation of replacing an expensive operation,
such as multiplication, by a cheaper one, such as addition, is known
as strength reduction.
• But induction variables not only allow us sometimes to perform a strength
reduction; often it is possible to eliminate all but one of a group of induction
variables whose values remain in lock step as we go around the loop.

Register Allocation and Assignment
• Register Allocation-what values should reside in registers.
• Register Assignment- In which register each value should
reside

Register Allocation and Assignment
• Global Register Allocation
• Usage Counts
• Register Assignment for Outer Loops
• Register Allocation by Graph Coloring

Global register allocation
• Previously explained algorithm does local (block based) register
allocation
• This resulted that all live variables be stored at the end of block
• To save some of these stores and their corresponding loads, we
might arrange to assign registers to frequently used variables and
keep these registers consistent across block boundaries (globally)
• Some options are:
• Keep values of variables used in loops inside registers
• Use graph coloring approach for more globally allocation

Usage counts
• X has been computed in a block will remain in a register if there are
subsequent uses of x in that block . Thus we count a savings of one
for each of use of x in Loop L not preceded by an assignment to x in
the same block.
• We save two units if we can avoid a store of x at the end of a block.
• Thus if x is allocated a register, we count savings of two for each
block in loop L for which x is live on exit and in which x is assigned a
value.

Usage counts
• For the loops we can approximate the saving by register allocation
as:
• Sum over all blocks (B) in a loop (L)
• For each uses of x before any definition in the block we add one unit of
saving
• If x is live on exit from B and is assigned a value in B, then we ass 2 units of
saving
Σ use(x,B) + 2 * live (x,B)
Use(x,B) is the number of times x is used in B prior to any definition of x.
Live(x,B) is 1 if x is live on exit from B and is assigned a value in B.
Live(x,B) is 0 otherwise.

B1 B2 B3 B4
a= (0+2*1) + (1+2*0) + (1+2*0) + (0+2*0) = 4
b= (1+2*0) + (0+2*0) + (0+2*1) + (0+2*1) = 5
c= (1+2*0) + (0+2*0) + (1+2*0) + (1+2*0) = 3
d= (1+2*1) + (1+2*0) + (1+2*0) + (1+2*0) = 6
e= (0+2*1) + (0+2*0) + (0+2*1) + (0+2*0) = 4
f= (1+2*0) + (0+2*1) + (1+2*0) + (0+2*0) = 4

Code sequence using global register
assignment

Register Assignment for Outer Loops
If outer loop L1 contains an inner loop L2, the names
allocated registers in L2 need not be allocated registers
in L1 - L2. How ever , if we choose to allocate x a
register in L2, but not L1, we must load x on entrance
to L2 and store x on exit from L2.

Register allocation by Graph coloring
• Two passes are used
• Target-machine instructions are selected as though there are an infinite
number of symbolic registers
• Assign physical registers to symbolic ones
• Create a register-interference graph
• Nodes are symbolic registers and edges connects two nodes if one is live at a point
where the other is defined.
• For example in the previous example an edge connects a and d in the graph
• Use a graph coloring algorithm to assign registers.
• A graph is said to be colored if each node has been assigned a color in such a way that
no two adjacent nodes have the same color.
• A color represents a register, and the color makes sure that no two symbolic registers
that can interfere with each other are assigned the same physical register.
• The problem of determining whether a graph is k-colorable is NP complete

unit-5.pptvshvshshhshsjjsjshhshshshhshsj

More Related Content

Similar to unit-5.pptvshvshshhshsjjsjshhshshshhshsj (20)

Recently uploaded (20)

unit-5.pptvshvshshhshsjjsjshhshshshhshsj