lalr. fo engineering student those who to

LALR stands for look ahead left right. It is a technique for
deciding when reductions have to be made in shift/reduce parsing.
Often, it can make the decisions without using a look ahead.
Sometimes, a look ahead of 1 is required.
Most parser generators (and in particular Bison and Yacc)
construct LALR parsers.
In LALR parsing, a deterministic finite automaton is used for
determining when reductions have to be made. The deterministic
finite automaton is usually called prefix automaton. On the
following slides, I will explain how it is constructed.
2

Items
Let G = (Σ, R, S) be a grammar.
Definition Let σ ∈ Σ, w1, w2 ∈ Σ∗
. If σ → w1 · w2 ∈ R, then
σ → w1.w2 is called an item.
An item is a rule with a dot added somewhere in the right hand
side.
The intuitive meaning of an item σ → w1.w2 is that w1 has been
read, and if w2 is also found, then rule σ → w1w2 can be reduced.
3

Items
Let a → bBc be a rule. The following items can be constructed
from this rule:
a → .bBc, a → b.Bc, a → bB.c, a → bBc.
For a given grammar G, the set of possible items is finite.
4

Operations on Itemsets (1)
Definition: An itemset is a set of items.
Because for a given grammar, there exists only a finite set of
possible items, the set of itemsets is also finite.
Let I be an itemset. The closure CLOS(I) of I is defined as the
smallest itemset J, s.t.
• I ⊆ J,
• If σ → w1.Aw2 ∈ J, and there exists a rule A → v ∈ R, then
A → .v ∈ J.
5

Operations on Itemsets (2)
Let I be an itemset, let α ∈ Σ be a symbol. The set TRANS(I, α)
is defined as
{σ → w1α.w2 | σ → w1.αw2 ∈ I }.
6

The Prefix Automaton
Let G = (Σ, R, S) be a grammar. The prefix automaton of G is the
deterministic finite automaton A = (Σ, Q, Qs, Qa, δ), that is the
result of the following algorithm:
• Start with A = (Σ, {CLOS(I)}, {CLOS(I)}, ∅, ∅), where
I = {Ŝ → .S #}, Ŝ 6∈ Σ is a new start symbol, S is the
original start symbol of G, and # 6∈ Σ is the EOF symbol.
• As long as there exists an I ∈ Q, and a σ ∈ Σ, s.t.
I′
= CLOS(TRANS(I, σ)) 6∈ Q, put
Q := Q ∪ {I′
}, δ := δ ∪ {(I, σ, I′
)}.
• As long as there exist I, I′
∈ Q, and a σ ∈ Σ, s.t.
I′
= CLOS(TRANS(I, σ)), and (I, σ, I′
) 6∈ δ, put
δ := δ ∪ {(I, σ, I′
)}.
7

The Prefix Automaton (2)
The prefix automaton can be big, but it can be easily computed.
Every context-free language has a prefix automaton, but not every
language can be parsed by an LALR parser, because of the look
ahead sets.
8

Parse Algorithm (1)
std::vector< state > states;
// Stack of states of the prefix automaton.
std::vector< token > tokens;
// We assume that a token has attributes, so
// we don’t encode them separately.
std::dequeue< token > lookahead;
// Will never be longer than one.
states. push_back( q0 ); // The initial state.
while( true )
{
9

Parse Algorithm (2)
decision = unknown;
state topstate = states. back( );
if(topstate has only one reduction R and no shifts)
decision = reduce(R);
// We know for sure that we need lookahead:
if( decision == unknown && lookahead. size( ) == 0 )
{
lookahead. push_back( inputstream. readtoken( ));
}
10

Parse Algorithm (3)
if( lookahead. front( ) == EOF )
{
if( topstate is an accepting state )
return tokens. back( );
else
return error, unexpected end of input.
}
11

Parse Algorithm (4)
if( decision == unknown &&
topstate has only one reduction R with
lookahead. front( ) &&
no shift is possible with lookahead. front( ))
{
}
if( decision == unknown &&
topstate has only a shift Q with
lookahead. front( ) &&
no reduction is possible with lookahead. front())
{
decision = shift(Q);
}
12

Parse Algorithm (5)
if( decision == unknown )
{
// Either we have a conflict, or the parser is
// stuck.
if( no reduction/no shift is possible )
print error message, try to recover.
13

Parse Algorithm (6)
// A conflict can be shift/reduce, or
// reduce/reduce:
Let R, from the set of possible reductions,
(taking into account lookahead. front( )),
be the rule with the smallest number.
}
14

Parse Algorithm (7)
if( decision == push(Q))
{
states. push_back( Q );
tokens. push_back( lookahead. front( ));
lookahead. pop_front( );
}
else
{
// decision has form reduce(R)
unsigned int n =
the length of the rhs of R.
15

Parse Algorithm (8)
token lhs =
compute_lhs( R,
tokens. begin( ) + tokens. size( ) - n,
tokens. begin( ) + tokens. size( ));
// this also computes the attribute.
for( unsigned int i = 0; i < n; ++ i )
{
states. pop_back( );
tokens. pop_back( );
}
16

Parse Algorithm (9)
// The shift of the lhs after a reduction is
// also called ’goto’
topstate = states. back( );
state newstate =
the state reachable from topstate under lhs.
states. push_back( newstate );
tokens. push_back( lhs );
}
}
// Unreachable.
17

Lookahead Sets
We already have seen lookahead sets in action.
If a state has more than one reduction, or a reduction and a shift,
the parser looks at the lookahead symbol, in order to decide what
to do next.
LA(I, σ → w) ⊆ Σ is defined a set of tokens. If the parser is in
state I, and the lookahead ∈ LA(I, σ → w), then the parser can
reduce σ → w.
When should a token σ be in LA(I, σ → w) ?
18

Lookahead Sets (2)
Definition:
s ∈ LA(I, σ → w) if
1. σ → w. ∈ I (obvious)
2. There exists a correct input word w1 · s · w2 · #, such that
3. The parser reaches a state with state stack (. . . , I) and token
stack (. . . , w), the lookahead (of the parser) is s, and
4. the parser can reduce the rule σ → w, after which
5. it can read the rest of the input w2 and reach an accepting
state.
19

Computing Look Ahead Sets
For every rule A → w of the grammar G, such that there exist
states I1, I2, I3, s.t. A → .w ∈ I1, A → w. ∈ I2, there exists a path
from I1 to I2 in the prefix automaton using w, and there is a
transition from I1 to I3 based on A, the following must hold:
• For every symbol σ ∈ Σ, for which a transition from I3 to some
other state is possible in the prefix automaton,
σ ∈ LA(I2, A → w.).
• For every item of form B → v. ∈ I3,
LA(I3, B → v.) ⊆ LA(I2, A → w.)
Compute the LA as the smallest such sets.
20

Computing Look Ahead Sets (2)
Example
S → Aa,
A → B,
A → Bb,
B → C,
B → Cc,
C → d.
21

The algorithm on the previous slides can sometimes compute too
big look ahead sets. You will see this in the exercises.
22

Computing the Correct Sets
I don’t want to say much about this, because it is complicated.
Definition: An LR(1)-item has form σ → w1.w2/s, where
σ → w1w2 is a rule of the grammar, and s ∈ S.
STEP remains the same.
CLOS has to be modified.
23

lalr. fo engineering student those who to

More Related Content

Similar to lalr. fo engineering student those who to (20)

Recently uploaded (20)

lalr. fo engineering student those who to