SlideShare a Scribd company logo
Introducing of Bison天官2011-09-151
Flex and Bisonstatement:   NAME '=' expressionexpression: NUMBER '+' NUMBER	   | NUMBER '-' NUMBERFlex:recognizes regular expressions.divides the input stream into pieces(token)terminal symbol:	Symbols produced by the lexer are called terminal symbols or tokensnonterminal symbol:Those that are defined on the left-hand side of rules are called nonterminal symbols or nonterminals.VSBisonfor building programs that handle structure input.
takes these pieces and groups them together logically.Shift/Reduce ParsingShiftAs the parser reads tokens, each time it reads a token that doesn't complete a rule, it pushes the token on an internal stack and switchs to a new state reflecting the token it just read. This action is called a shift.ReduceWhen it has found all the symbols that constitute the right-hand side of a rule, it pops the right-hand side symbols off the stack, pushes the left-hand side symbol onto the stack, and switches to a new state reflecting the new symbol on the stack. This action is called a reduction.
Parsing methodsBison parsers can use either of two parsing methods, known as LALR(1) and GLRLALR(1) (Look Ahead Left to Right with a one-token lookahead), which is less powerful but considerably faster and easier to use than GLR.GLR (Generalized Left to Right).The most common kind of language that computer parsers handle is a context-free grammar(CFG)The standard form to write down a CFG is Baskus-Naur Form (BNF)
LR parserLR parser is a parser that reads input from Left to right and produces a Rightmost derivation. The term LR(k) parser is also used; where the k refers to the number of unconsumed "look ahead" input symbols that are used in making parsing decisions.Usually k is 1 and the term LR parser is often intended to refer to this case. (LALR(1))
look aheadLALR(1) cannot deal with grammars that need more than one token of lookahead to tell whether it has matched a rule.phrase:	     cart_animal  AND CART	  |  work_animal  AND PLOWcart_animal:     HORSE | GOATwork_animal:   HORSE | OXphrase:	     cart_animalCART	  |  work_animalPLOWcart_animal:     HORSE | GOATwork_animal:   HORSE | OXNot support!ORphrase:	     cart_animal  AND CART	  |  work_animal  AND PLOWcart_animal:     HORSE | GOATwork_animal:   OX
Rightmost DerivationRule 1expr  expr – digitexprexpr – digitexprexpr + digitexpr digitdigit 0|1|2|…|9Example input:   3 + 8 - 2The rightmost non-terminal is replaced in each stepRule 4expr – digit  expr – 2Rule 2expr – 2 expr + digit - 2Rule 4expr + digit - 2  expr + 8-2Rule 3expr + 8-2 digit + 8-2Rule 4digit + 8-23+8 -2
Leftmost DerivationRule 1expr  expr – digitThe leftmost non-terminal is replaced in each stepexpr11Rule 2expr – digit  expr + digit – digit223expr-digitRule 3expr + digit – digit  digit + digit – digit354exprdigit+Rule 44digit + digit – digit3 + digit – digit2Rule 43 + digit – digit 3 + 8 – digit56digit8Rule 43 + 8 – digit 3 + 8 – 263
Leftmost DerivationRule 1expr  expr – digitexpr  expr – digitexpr  expr + digitexpr  digitdigit 0|1|2|…|9Example input:   3 + 8 - 2The leftmost non-terminal is replaced in each stepRule 2expr – digit  expr + digit – digitRule 3expr + digit – digit  digit + digit – digitRule 4digit + digit – digit3 + digit – digitRule 43 + digit – digit 3 + 8 – digitRule 43 + 8 – digit 3 + 8 – 2
Leftmost DerivationRule 1expr  expr – digitThe leftmost non-terminal is replaced in each stepexpr11Rule 2expr – digit  expr + digit – digit622expr-digitRule 3expr + digit – digit  digit + digit – digit335exprdigit+Rule 44digit + digit – digit3 + digit – digit2Rule 43 + digit – digit 3 + 8 – digit54digit8Rule 43 + 8 – digit 3 + 8 – 263
Context-Free GrammarsA context-free grammar G is defined by the 4-tuple:G = (V, ∑, R, S) whereV is a finite set; each element  v ϵ V is called a non-terminal character or a variable. Each variable represents a different type of phrase or clause in the sentence. Variables are also sometimes called syntactic categories. Each variable defines a sub-language of the language defined by .∑ is a finite set of terminals, disjoint from V, which make up the actual content of the sentence. The set of terminals is the alphabet of the language defined by the grammar G.R is a finite relation from V to (V U ∑)*. The members of R are called the (rewrite) rules or productions of the grammar.S is the start variable (or start symbol), used to represent the whole sentence (or program). It must be an element of V.The asterisk represents the Kleene star operation.
Context-free languageThe language of grammar G = (V, ∑, R, S) is the set	L(G) = { ωϵ ∑* : S ωω }	A language L is said to be context-free languange(CFL), if there exists a CFG G, such that L = L(G).
Context-Free GrammarsComprised ofA set of tokens or terminal symbolsA set of non-terminal symbolsA set of rules or productions which express the legal relationships between symbolsA start or goal symbolExample:exprexpr – digitexprexpr + digitexpr digitdigit 0|1|2|…|9Tokens: -,+,0,1,2,…,9
Non-terminals: expr, digit
Start symbol: exprA Bison ParserA bison specification has the same three-part structure as a flex specification.... definition section ...%%... rules section ...%%                                           a bison example... user subroutines ...The first section, the definition section, handles control information for the parser and generally sets up the execution environment in which the parser will operate.The second section contains the rules for the parser.The third section is C code copied verbatim into the generated C program.
TermsSymbols are strings of letters, digits, periods, and underscores that do not start with a digit.error is reserved for error recovery.Do not use C reserved words or bison's own symbols such as yyparse.Symbols produced by the lexer are called terminal symbols or tokensThose that are defined on the left-hand side of rules are called nonterminal symbols or nonterminals.
Structure of a Bison Specification... definition section ...	%%	... rules section ...	%%	... user subroutines ...
Literal Block	%{	... C code and declarations ...	%}The contents of the literal block are copied verbatim to the generated C source file near the beginning, before the beginning of yypare().Usually contains declarations of variables and functions, as well as #include.Bison also provides an experimental %code POS { ... } where POS is a keyword to suggest where in the generated parser the code should go.
Delaration%parse-param%require "2.4“ declare the minimum version of bison needed to compile it%startidentifies the top-level rule (Named the first rule.)%union%token%type%left%right%nonassoc%expect
TokenDefine the ternimators.Bison treats a character in single quotes as a tokenBison also allows you to decalre strings as aliases for tokensThis defines the token NE and lets you use NE and != interchangeably in the parser. The lexer must still return the internal token values for NE when the token is read, not a string.expr: '(' expr ')';%token NE "!="%%...expr:	expr "!=" exp;
Parse-paramNormally, you call yyparse() with no arguments, if you need, youcan add parameters to its definition:%parse-param { char *modulename }		%parse-param { int intensity }This allows you to call yyparse("mymodule", 42)
TypeThe %union declaration specifies the entire list of possible types%token is used for declaring token types%type is used for declaring nonterminal symbols%{#include "calc.h“      /* Contains definition of `symrec' */ %} %union { 	double val;              /* For returning numbers. */ symrec *tptr;           /* For returning symbol-table pointers */ } %token <tptr> VAR FNCT   /* Variable and Function */ %type <val> exp %%
Structure of a Bison Specification	... definition section ...	%%... rules section ...	%%	... user subroutines ...
ActionsAn action is C code executed when bison matches a rule in the grammar.The action can refer to the values associated with the symbols in the rule by using a dollar sign followed by a number.The name $$ refers to the value for the left-hand side (LHS) symbol.For rules with no action, bison uses a default of the followingdate: month '/' day '/' year	{ printf("date %d-%d-%d found", $1, $3, $5); } ;{ $$ = $1; }
RulesRecursive RulesThe action can refer to the values associated with the symbols in the rule by using a dollar sign followed by a number.In most cases, Bison handles left recursion much more efficiently than right recursion.numberlist  :	/* empty */                   |	numberlist NUMBER	     ;exprlist: exprlist ',' expr;	/* left recursion */orexprlist: expr ',' exprlist;	/* right recursion */
Special Characters%	All of the declarations in the definition section start with %.$	In actions, a dollar sign introduces a value reference.@	In actions, an @ sign introduces a location reference, such as @2 for the location of the second symbol in the RHS.'	Literal tokens are enclosed in single quotes."	Bison lets you declare quoted string as parser alias for tokens.<>	In a value reference in an action, you can override the value's default type by enclosing the type name in angle brackets.{}	The C code in actions is enclosed in curly braces.;	Each rule in the rules section should end with a semicolon.|	or syntax for multi-rules with same LHS.:	separate left-hand side and right-hand side-	Symbols may include underscores along with letters, digits, and periods..	Symbols may include periods along with letters, digits, and underscores.
ReservedYYABORTIn an action makes the parser routine yyparse() return immediately with a nonzero value, indicating failure.YYACCEPTIn an action makes the parser routine yyparse() return immediately with a value 0, indicating success.YYBACKUPThe macro YYBACKUP lets you unshift the current token and replace it with something else.sym:	TOKEN	{ YYBACKUP(newtok, newval); }It is extremely difficult to use YYBACKUP() correctly, so you're best off not using it.
ReservedyyclearinThe macro yyclearin in an action discards a lookahead token if one has been read. It is most oftern useful in error recovery in an interactive parser to put the paarser into a known state after an error:YYDEBUGTo include the trace code, either use the -t flag on the bison command line or else define the C preprocessor symbol YYDEBUG to be nonzero either on the C compiler command line or by inlcuding something like this in the definition section:stmtlist   :	stmt | stmtlist stmt;stmt	:	error { reset_input(); yyclearin; };%{#define YYDEBUG 1%}
Ambiguity and ConflictsThe grammar is truly ambiguousShift/Reduce ConflictsReduce/Reduce ConflictsThe grammar is unambiguous, but the standard parsing technique that bison uses is not powerful enough to parse the grammar. (need to look more than one token ahead)We have already told about it of LALR(1).
Reduce/Reduce ConflictsA reduce/reduce conflict occurs when the same token could complete two different rules.	%%	prog:	proga | progb;	proga:	'X';	progb:	'X';
Shift/Reduce Conflicts%type <a> exp...%%...expr: expr '+' exp         { $$ = newast('+', $1, $3); }       | expr '-' exp          { $$ = newast('-', $1, $3); }       | expr '*' exp          { $$ = newast('*', $1, $3); }       | expr '/' exp          { $$ = newast('/', $1, $3); }       | '|' exp          { $$ = newast('|', $2, NULL); }       | '(' exp ')'          { $$ = $2); }       | '-' exp          { $$ = newast('M', $2, NULL); }       | NUMBER { $$ = newnum($1); }       ;%%Example   2+3*4
ProblemAt this point, the parser looks at the * and could either reduce 2+3 using;to an expression or shift the *, expecting to be able to reduce:	later on.2		shift NUMBERE		reduce E->NUMBERE +		shift +E + 3		shift NUMBERE + E		reduce E->NUMBERExample   2+3 * 4expr:    expr '+' expexpr:    expr ‘*' exp
AnalysisThe problem is that we haven't told bison about the precedence and associativity of the operators.Precedence controls which operators execute first in an expression.In and expression grammar, operators are grouped into levels of precedence from lowest to highest.The total number of levels depends on the language. The C language is notorious for having too many precedence levels, a total of 15 levels.Associativity controls the grouping of operators at the same precedence level.
Implicitly Solution%type <a> exp exp1 exp2...%%...expr : expr1 '+' exp1 { $$ = newast('+', $1, $3); }        | expr1 '-' exp1 { $$ = newast('-', $1, $3); }        | expr1 { $$ = $1; }expr1: expr2 '*' exp2 { $$ = newast('*', $1, $3); }         | expr2 '/' exp2 { $$ = newast('/', $1, $3); }         | expr2 { $$ = $1; }expr2: '|' exp { $$ = newast('|', $2, NULL); }         | '(' exp ')' { $$ = $2); }         | '-' exp { $$ = newast('M', $2, NULL); }         | NUMBER { $$ = newnum($1); }         ;%%
Explicitly Solution%left '+' '-’%left '*' '/’%nonassoc '|' NMINUS%type <a> exp exp1 exp2...%%...expr: expr '+' exp { $$ = newast('+', $1, $3); }       | expr '-' exp { $$ = newast('-', $1, $3); }       | expr '*' exp { $$ = newast('*', $1, $3); }       | expr '/' exp { $$ = newast('/', $1, $3); }       | '|' exp { $$ = newast('|', $2, NULL); }       | '(' exp ')' { $$ = $2); }       | '-' exp %prec UMINUS { $$ = newast('M', $2, NULL); }       | NUMBER { $$ = newnum($1); }       ;%%
Explicitly Solution%left, %right, and %nonassoc declarations defining the order of precedence from lowest to highest.%left, left associative%right, right associative%nonaccoc, no associativityUMINUS, pseudo token standing fro unary minus%prec UMINUS, %prec tells bison to use the precedence of UMINUS for this rule.
IF/THEN/ELSE conflictWhen Not to Use Precedence RulesIn expression grammars and to resolve the "dangling else" conflict in grammars for if/then/else language constructs, it is easy to understand.But in other situations, it can be extremely difficult to understand.stmt:	IF '(' cond ')' stmt       |	IF '(' cond ')' stmt ELSE stmt       |	TERMINALcond:	TERMINALAmbiguous!!!IF ( cond ) IF ( cond ) stmt ELSE stmtWhich one?IF ( cond ) { IF ( cond ) stmt  } ELSE stmtIF ( cond ) { IF ( cond ) stmt ELSE stmt }
Implicitly Solutionstmt :matched       |	unmatched        ;matched  :other_stmt   |	IF expr THEN matched ELSE matched   ;unmatched  :	IF expr THEN stmt       |	IF expr THEN matched ELSE unmatched			;other_stmt:	/* rules for other kinds of statement */...IF ( cond ) { IF ( cond ) stmt ELSE stmt }
Explicitly Solution%nonassoc	THEN%nonassoc	ELSE%%stmt  :	IF expr THEN stmt         |	IF expr stmt ELSE stmt         ;Equal to:%nonassoc LOWER_THAN_ELSE%nonassoc ELSE%%stmt  :	IF expr stmt %prec LOWER_THAN_ELSE         |	IF expr stmt ELSE stmt        ;IF ( cond ) { IF ( cond ) stmt ELSE stmt }
expectOccasionally you may have a grammar that has a few conflicts, you are confident that bison will resolve them the way you want, and it's too much hassle to rewrite the grammar to get rid of them.%expect N tells bison that your parser should have N shift/reduce conflicts.%expect-rr N to tell it how many reduce/reduce conflicts to expect.
Common Bugs In Bison ProgramsInfinite Recursion%%xlist:	xlist  ‘X’ ;should be ==>%%xlist  :	'X'|	xlist  'X’       ;
Common Bugs In Bison ProgramsInterchanging Precedence%token NUMBER%left PLUS%left MUL%%expr	:	expr PLUS expr %prec MUL	|	expr MUL expr %prec PLUS	|	NUMBER	;
Lexical FeedbackParsers can sometimes feed information back to the lexer to handle otherwise difficult situations. E.g. syntax like this:message ( any characters )/* parser */%{	init parenstring = 0;}%...%%statement: MESSAGE { parenstring = 1; } '(' STRING ')';
Lexical Feedback/* lexer */%{	extern int parenstring;%}%s PSTRING%%"message"	 return MESSAGE;"(" {    if(parenstring)  BEGIN PSTRING;    return '('; }<PSTRING>[^)]* {    yylval.svalue = strdup(yytext);     BEGIN INITIAL;    return STRING;                            }
Structure of a Bison Specification	... definition section ...	%%	... rules section ...	%%... user subroutines ...
User subroutines SectionThis section typically includes routines called from the actions.Nothing special.

More Related Content

What's hot (20)

PPTX
Lexical analysis - Compiler Design
Muhammed Afsal Villan
 
PPTX
Switch case in C++
Barani Govindarajan
 
PPTX
Yacc
BBDITM LUCKNOW
 
PDF
Chapter 3 REGULAR EXPRESSION.pdf
dawod yimer
 
PPT
Context free grammar
Radhakrishnan Chinnusamy
 
PDF
Syntax analysis
Akshaya Arunan
 
PDF
Operator Overloading in C++
Mohammed Sikander
 
PPT
Searching in c language
CHANDAN KUMAR
 
PPTX
Predictive parser
Jothi Lakshmi
 
PPTX
NFA Converted to DFA , Minimization of DFA , Transition Diagram
Abdullah Jan
 
PPTX
Function overloading and overriding
Rajab Ali
 
PPTX
Lecture 6 disjoint set
Abirami A
 
PPT
binary tree
Shankar Bishnoi
 
PDF
5 - Advanced SVE.pdf
JunZhao68
 
PPT
Decision Making and Branching in C
RAJ KUMAR
 
PPTX
Intermediate code generation1
Shashwat Shriparv
 
PPT
Xpath
Manav Prasad
 
PDF
nodeMCU IOT教學03 - NodeMCU導論
吳錫修 (ShyiShiou Wu)
 
PPTX
Parts of speech tagger
sadakpramodh
 
Lexical analysis - Compiler Design
Muhammed Afsal Villan
 
Switch case in C++
Barani Govindarajan
 
Chapter 3 REGULAR EXPRESSION.pdf
dawod yimer
 
Context free grammar
Radhakrishnan Chinnusamy
 
Syntax analysis
Akshaya Arunan
 
Operator Overloading in C++
Mohammed Sikander
 
Searching in c language
CHANDAN KUMAR
 
Predictive parser
Jothi Lakshmi
 
NFA Converted to DFA , Minimization of DFA , Transition Diagram
Abdullah Jan
 
Function overloading and overriding
Rajab Ali
 
Lecture 6 disjoint set
Abirami A
 
binary tree
Shankar Bishnoi
 
5 - Advanced SVE.pdf
JunZhao68
 
Decision Making and Branching in C
RAJ KUMAR
 
Intermediate code generation1
Shashwat Shriparv
 
nodeMCU IOT教學03 - NodeMCU導論
吳錫修 (ShyiShiou Wu)
 
Parts of speech tagger
sadakpramodh
 

Similar to Introduction of bison (20)

PPTX
compiler design syntax analysis top down parsing
babar532588
 
PPT
Module 11
bittudavis
 
PPT
Syntax analysis and Run time Environment
cscprabh
 
PDF
Compiler Design File
Archita Misra
 
PPT
Chapter Three(1)
bolovv
 
PPT
atc 3rd module compiler and automata.ppt
ranjan317165
 
PPT
Les origines de Javascript
Bernard Loire
 
PPT
Javascript
guest03a6e6
 
PPT
The JavaScript Programming Language
Raghavan Mohan
 
PPT
Javascript by Yahoo
birbal
 
PPT
The Java Script Programming Language
zone
 
PPT
Javascript
vikram singh
 
PPT
Lecturer-05 lex anylser (1).pptrjyghsgst
engrsheikhmuhammadha
 
DOCX
8-Practice problems on operator precedence parser-24-05-2023.docx
venkatapranaykumarGa
 
PPT
SYNTAX ANALYSIS, PARSING, BACKTRACKING IN COMPILER DESIGN
nithya g
 
PDF
compiler design course focus of chapter 3 and 4 part
abel185080
 
PPTX
LISP: Input And Output
LISP Content
 
PPTX
LISP: Input And Output
DataminingTools Inc
 
PDF
Assignment5
Sunita Milind Dol
 
compiler design syntax analysis top down parsing
babar532588
 
Module 11
bittudavis
 
Syntax analysis and Run time Environment
cscprabh
 
Compiler Design File
Archita Misra
 
Chapter Three(1)
bolovv
 
atc 3rd module compiler and automata.ppt
ranjan317165
 
Les origines de Javascript
Bernard Loire
 
Javascript
guest03a6e6
 
The JavaScript Programming Language
Raghavan Mohan
 
Javascript by Yahoo
birbal
 
The Java Script Programming Language
zone
 
Javascript
vikram singh
 
Lecturer-05 lex anylser (1).pptrjyghsgst
engrsheikhmuhammadha
 
8-Practice problems on operator precedence parser-24-05-2023.docx
venkatapranaykumarGa
 
SYNTAX ANALYSIS, PARSING, BACKTRACKING IN COMPILER DESIGN
nithya g
 
compiler design course focus of chapter 3 and 4 part
abel185080
 
LISP: Input And Output
LISP Content
 
LISP: Input And Output
DataminingTools Inc
 
Assignment5
Sunita Milind Dol
 
Ad

Recently uploaded (20)

PDF
Women's Health: Essential Tips for Every Stage.pdf
Iftikhar Ahmed
 
PDF
CONCURSO DE POESIA “POETUFAS – PASSOS SUAVES PELO VERSO.pdf
Colégio Santa Teresinha
 
PDF
0725.WHITEPAPER-UNIQUEWAYSOFPROTOTYPINGANDUXNOW.pdf
Thomas GIRARD, MA, CDP
 
PDF
The Different Types of Non-Experimental Research
Thelma Villaflores
 
PDF
LAW OF CONTRACT ( 5 YEAR LLB & UNITARY LLB)- MODULE-3 - LEARN THROUGH PICTURE
APARNA T SHAIL KUMAR
 
PPTX
SPINA BIFIDA: NURSING MANAGEMENT .pptx
PRADEEP ABOTHU
 
PDF
The History of Phone Numbers in Stoke Newington by Billy Thomas
History of Stoke Newington
 
PDF
The-Ever-Evolving-World-of-Science (1).pdf/7TH CLASS CURIOSITY /1ST CHAPTER/B...
Sandeep Swamy
 
PPTX
How to Set Maximum Difference Odoo 18 POS
Celine George
 
PDF
CEREBRAL PALSY: NURSING MANAGEMENT .pdf
PRADEEP ABOTHU
 
PPTX
grade 5 lesson matatag ENGLISH 5_Q1_PPT_WEEK4.pptx
SireQuinn
 
PDF
Knee Extensor Mechanism Injuries - Orthopedic Radiologic Imaging
Sean M. Fox
 
PDF
Biological Bilingual Glossary Hindi and English Medium
World of Wisdom
 
PPTX
Soil and agriculture microbiology .pptx
Keerthana Ramesh
 
PPTX
PATIENT ASSIGNMENTS AND NURSING CARE RESPONSIBILITIES.pptx
PRADEEP ABOTHU
 
PPTX
MENINGITIS: NURSING MANAGEMENT, BACTERIAL MENINGITIS, VIRAL MENINGITIS.pptx
PRADEEP ABOTHU
 
PPTX
A PPT on Alfred Lord Tennyson's Ulysses.
Beena E S
 
PPTX
Unit 2 COMMERCIAL BANKING, Corporate banking.pptx
AnubalaSuresh1
 
PPTX
Universal immunization Programme (UIP).pptx
Vishal Chanalia
 
PPSX
HEALTH ASSESSMENT (Community Health Nursing) - GNM 1st Year
Priyanshu Anand
 
Women's Health: Essential Tips for Every Stage.pdf
Iftikhar Ahmed
 
CONCURSO DE POESIA “POETUFAS – PASSOS SUAVES PELO VERSO.pdf
Colégio Santa Teresinha
 
0725.WHITEPAPER-UNIQUEWAYSOFPROTOTYPINGANDUXNOW.pdf
Thomas GIRARD, MA, CDP
 
The Different Types of Non-Experimental Research
Thelma Villaflores
 
LAW OF CONTRACT ( 5 YEAR LLB & UNITARY LLB)- MODULE-3 - LEARN THROUGH PICTURE
APARNA T SHAIL KUMAR
 
SPINA BIFIDA: NURSING MANAGEMENT .pptx
PRADEEP ABOTHU
 
The History of Phone Numbers in Stoke Newington by Billy Thomas
History of Stoke Newington
 
The-Ever-Evolving-World-of-Science (1).pdf/7TH CLASS CURIOSITY /1ST CHAPTER/B...
Sandeep Swamy
 
How to Set Maximum Difference Odoo 18 POS
Celine George
 
CEREBRAL PALSY: NURSING MANAGEMENT .pdf
PRADEEP ABOTHU
 
grade 5 lesson matatag ENGLISH 5_Q1_PPT_WEEK4.pptx
SireQuinn
 
Knee Extensor Mechanism Injuries - Orthopedic Radiologic Imaging
Sean M. Fox
 
Biological Bilingual Glossary Hindi and English Medium
World of Wisdom
 
Soil and agriculture microbiology .pptx
Keerthana Ramesh
 
PATIENT ASSIGNMENTS AND NURSING CARE RESPONSIBILITIES.pptx
PRADEEP ABOTHU
 
MENINGITIS: NURSING MANAGEMENT, BACTERIAL MENINGITIS, VIRAL MENINGITIS.pptx
PRADEEP ABOTHU
 
A PPT on Alfred Lord Tennyson's Ulysses.
Beena E S
 
Unit 2 COMMERCIAL BANKING, Corporate banking.pptx
AnubalaSuresh1
 
Universal immunization Programme (UIP).pptx
Vishal Chanalia
 
HEALTH ASSESSMENT (Community Health Nursing) - GNM 1st Year
Priyanshu Anand
 
Ad

Introduction of bison

  • 2. Flex and Bisonstatement: NAME '=' expressionexpression: NUMBER '+' NUMBER | NUMBER '-' NUMBERFlex:recognizes regular expressions.divides the input stream into pieces(token)terminal symbol: Symbols produced by the lexer are called terminal symbols or tokensnonterminal symbol:Those that are defined on the left-hand side of rules are called nonterminal symbols or nonterminals.VSBisonfor building programs that handle structure input.
  • 3. takes these pieces and groups them together logically.Shift/Reduce ParsingShiftAs the parser reads tokens, each time it reads a token that doesn't complete a rule, it pushes the token on an internal stack and switchs to a new state reflecting the token it just read. This action is called a shift.ReduceWhen it has found all the symbols that constitute the right-hand side of a rule, it pops the right-hand side symbols off the stack, pushes the left-hand side symbol onto the stack, and switches to a new state reflecting the new symbol on the stack. This action is called a reduction.
  • 4. Parsing methodsBison parsers can use either of two parsing methods, known as LALR(1) and GLRLALR(1) (Look Ahead Left to Right with a one-token lookahead), which is less powerful but considerably faster and easier to use than GLR.GLR (Generalized Left to Right).The most common kind of language that computer parsers handle is a context-free grammar(CFG)The standard form to write down a CFG is Baskus-Naur Form (BNF)
  • 5. LR parserLR parser is a parser that reads input from Left to right and produces a Rightmost derivation. The term LR(k) parser is also used; where the k refers to the number of unconsumed "look ahead" input symbols that are used in making parsing decisions.Usually k is 1 and the term LR parser is often intended to refer to this case. (LALR(1))
  • 6. look aheadLALR(1) cannot deal with grammars that need more than one token of lookahead to tell whether it has matched a rule.phrase: cart_animal AND CART | work_animal AND PLOWcart_animal: HORSE | GOATwork_animal: HORSE | OXphrase: cart_animalCART | work_animalPLOWcart_animal: HORSE | GOATwork_animal: HORSE | OXNot support!ORphrase: cart_animal AND CART | work_animal AND PLOWcart_animal: HORSE | GOATwork_animal: OX
  • 7. Rightmost DerivationRule 1expr  expr – digitexprexpr – digitexprexpr + digitexpr digitdigit 0|1|2|…|9Example input: 3 + 8 - 2The rightmost non-terminal is replaced in each stepRule 4expr – digit  expr – 2Rule 2expr – 2 expr + digit - 2Rule 4expr + digit - 2  expr + 8-2Rule 3expr + 8-2 digit + 8-2Rule 4digit + 8-23+8 -2
  • 8. Leftmost DerivationRule 1expr  expr – digitThe leftmost non-terminal is replaced in each stepexpr11Rule 2expr – digit  expr + digit – digit223expr-digitRule 3expr + digit – digit  digit + digit – digit354exprdigit+Rule 44digit + digit – digit3 + digit – digit2Rule 43 + digit – digit 3 + 8 – digit56digit8Rule 43 + 8 – digit 3 + 8 – 263
  • 9. Leftmost DerivationRule 1expr  expr – digitexpr  expr – digitexpr  expr + digitexpr  digitdigit 0|1|2|…|9Example input: 3 + 8 - 2The leftmost non-terminal is replaced in each stepRule 2expr – digit  expr + digit – digitRule 3expr + digit – digit  digit + digit – digitRule 4digit + digit – digit3 + digit – digitRule 43 + digit – digit 3 + 8 – digitRule 43 + 8 – digit 3 + 8 – 2
  • 10. Leftmost DerivationRule 1expr  expr – digitThe leftmost non-terminal is replaced in each stepexpr11Rule 2expr – digit  expr + digit – digit622expr-digitRule 3expr + digit – digit  digit + digit – digit335exprdigit+Rule 44digit + digit – digit3 + digit – digit2Rule 43 + digit – digit 3 + 8 – digit54digit8Rule 43 + 8 – digit 3 + 8 – 263
  • 11. Context-Free GrammarsA context-free grammar G is defined by the 4-tuple:G = (V, ∑, R, S) whereV is a finite set; each element v ϵ V is called a non-terminal character or a variable. Each variable represents a different type of phrase or clause in the sentence. Variables are also sometimes called syntactic categories. Each variable defines a sub-language of the language defined by .∑ is a finite set of terminals, disjoint from V, which make up the actual content of the sentence. The set of terminals is the alphabet of the language defined by the grammar G.R is a finite relation from V to (V U ∑)*. The members of R are called the (rewrite) rules or productions of the grammar.S is the start variable (or start symbol), used to represent the whole sentence (or program). It must be an element of V.The asterisk represents the Kleene star operation.
  • 12. Context-free languageThe language of grammar G = (V, ∑, R, S) is the set L(G) = { ωϵ ∑* : S ωω } A language L is said to be context-free languange(CFL), if there exists a CFG G, such that L = L(G).
  • 13. Context-Free GrammarsComprised ofA set of tokens or terminal symbolsA set of non-terminal symbolsA set of rules or productions which express the legal relationships between symbolsA start or goal symbolExample:exprexpr – digitexprexpr + digitexpr digitdigit 0|1|2|…|9Tokens: -,+,0,1,2,…,9
  • 15. Start symbol: exprA Bison ParserA bison specification has the same three-part structure as a flex specification.... definition section ...%%... rules section ...%% a bison example... user subroutines ...The first section, the definition section, handles control information for the parser and generally sets up the execution environment in which the parser will operate.The second section contains the rules for the parser.The third section is C code copied verbatim into the generated C program.
  • 16. TermsSymbols are strings of letters, digits, periods, and underscores that do not start with a digit.error is reserved for error recovery.Do not use C reserved words or bison's own symbols such as yyparse.Symbols produced by the lexer are called terminal symbols or tokensThose that are defined on the left-hand side of rules are called nonterminal symbols or nonterminals.
  • 17. Structure of a Bison Specification... definition section ... %% ... rules section ... %% ... user subroutines ...
  • 18. Literal Block %{ ... C code and declarations ... %}The contents of the literal block are copied verbatim to the generated C source file near the beginning, before the beginning of yypare().Usually contains declarations of variables and functions, as well as #include.Bison also provides an experimental %code POS { ... } where POS is a keyword to suggest where in the generated parser the code should go.
  • 19. Delaration%parse-param%require "2.4“ declare the minimum version of bison needed to compile it%startidentifies the top-level rule (Named the first rule.)%union%token%type%left%right%nonassoc%expect
  • 20. TokenDefine the ternimators.Bison treats a character in single quotes as a tokenBison also allows you to decalre strings as aliases for tokensThis defines the token NE and lets you use NE and != interchangeably in the parser. The lexer must still return the internal token values for NE when the token is read, not a string.expr: '(' expr ')';%token NE "!="%%...expr: expr "!=" exp;
  • 21. Parse-paramNormally, you call yyparse() with no arguments, if you need, youcan add parameters to its definition:%parse-param { char *modulename } %parse-param { int intensity }This allows you to call yyparse("mymodule", 42)
  • 22. TypeThe %union declaration specifies the entire list of possible types%token is used for declaring token types%type is used for declaring nonterminal symbols%{#include "calc.h“ /* Contains definition of `symrec' */ %} %union { double val; /* For returning numbers. */ symrec *tptr; /* For returning symbol-table pointers */ } %token <tptr> VAR FNCT /* Variable and Function */ %type <val> exp %%
  • 23. Structure of a Bison Specification ... definition section ... %%... rules section ... %% ... user subroutines ...
  • 24. ActionsAn action is C code executed when bison matches a rule in the grammar.The action can refer to the values associated with the symbols in the rule by using a dollar sign followed by a number.The name $$ refers to the value for the left-hand side (LHS) symbol.For rules with no action, bison uses a default of the followingdate: month '/' day '/' year { printf("date %d-%d-%d found", $1, $3, $5); } ;{ $$ = $1; }
  • 25. RulesRecursive RulesThe action can refer to the values associated with the symbols in the rule by using a dollar sign followed by a number.In most cases, Bison handles left recursion much more efficiently than right recursion.numberlist : /* empty */ | numberlist NUMBER ;exprlist: exprlist ',' expr; /* left recursion */orexprlist: expr ',' exprlist; /* right recursion */
  • 26. Special Characters% All of the declarations in the definition section start with %.$ In actions, a dollar sign introduces a value reference.@ In actions, an @ sign introduces a location reference, such as @2 for the location of the second symbol in the RHS.' Literal tokens are enclosed in single quotes." Bison lets you declare quoted string as parser alias for tokens.<> In a value reference in an action, you can override the value's default type by enclosing the type name in angle brackets.{} The C code in actions is enclosed in curly braces.; Each rule in the rules section should end with a semicolon.| or syntax for multi-rules with same LHS.: separate left-hand side and right-hand side- Symbols may include underscores along with letters, digits, and periods.. Symbols may include periods along with letters, digits, and underscores.
  • 27. ReservedYYABORTIn an action makes the parser routine yyparse() return immediately with a nonzero value, indicating failure.YYACCEPTIn an action makes the parser routine yyparse() return immediately with a value 0, indicating success.YYBACKUPThe macro YYBACKUP lets you unshift the current token and replace it with something else.sym: TOKEN { YYBACKUP(newtok, newval); }It is extremely difficult to use YYBACKUP() correctly, so you're best off not using it.
  • 28. ReservedyyclearinThe macro yyclearin in an action discards a lookahead token if one has been read. It is most oftern useful in error recovery in an interactive parser to put the paarser into a known state after an error:YYDEBUGTo include the trace code, either use the -t flag on the bison command line or else define the C preprocessor symbol YYDEBUG to be nonzero either on the C compiler command line or by inlcuding something like this in the definition section:stmtlist : stmt | stmtlist stmt;stmt : error { reset_input(); yyclearin; };%{#define YYDEBUG 1%}
  • 29. Ambiguity and ConflictsThe grammar is truly ambiguousShift/Reduce ConflictsReduce/Reduce ConflictsThe grammar is unambiguous, but the standard parsing technique that bison uses is not powerful enough to parse the grammar. (need to look more than one token ahead)We have already told about it of LALR(1).
  • 30. Reduce/Reduce ConflictsA reduce/reduce conflict occurs when the same token could complete two different rules. %% prog: proga | progb; proga: 'X'; progb: 'X';
  • 31. Shift/Reduce Conflicts%type <a> exp...%%...expr: expr '+' exp { $$ = newast('+', $1, $3); } | expr '-' exp { $$ = newast('-', $1, $3); } | expr '*' exp { $$ = newast('*', $1, $3); } | expr '/' exp { $$ = newast('/', $1, $3); } | '|' exp { $$ = newast('|', $2, NULL); } | '(' exp ')' { $$ = $2); } | '-' exp { $$ = newast('M', $2, NULL); } | NUMBER { $$ = newnum($1); } ;%%Example 2+3*4
  • 32. ProblemAt this point, the parser looks at the * and could either reduce 2+3 using;to an expression or shift the *, expecting to be able to reduce: later on.2 shift NUMBERE reduce E->NUMBERE + shift +E + 3 shift NUMBERE + E reduce E->NUMBERExample 2+3 * 4expr: expr '+' expexpr: expr ‘*' exp
  • 33. AnalysisThe problem is that we haven't told bison about the precedence and associativity of the operators.Precedence controls which operators execute first in an expression.In and expression grammar, operators are grouped into levels of precedence from lowest to highest.The total number of levels depends on the language. The C language is notorious for having too many precedence levels, a total of 15 levels.Associativity controls the grouping of operators at the same precedence level.
  • 34. Implicitly Solution%type <a> exp exp1 exp2...%%...expr : expr1 '+' exp1 { $$ = newast('+', $1, $3); } | expr1 '-' exp1 { $$ = newast('-', $1, $3); } | expr1 { $$ = $1; }expr1: expr2 '*' exp2 { $$ = newast('*', $1, $3); } | expr2 '/' exp2 { $$ = newast('/', $1, $3); } | expr2 { $$ = $1; }expr2: '|' exp { $$ = newast('|', $2, NULL); } | '(' exp ')' { $$ = $2); } | '-' exp { $$ = newast('M', $2, NULL); } | NUMBER { $$ = newnum($1); } ;%%
  • 35. Explicitly Solution%left '+' '-’%left '*' '/’%nonassoc '|' NMINUS%type <a> exp exp1 exp2...%%...expr: expr '+' exp { $$ = newast('+', $1, $3); } | expr '-' exp { $$ = newast('-', $1, $3); } | expr '*' exp { $$ = newast('*', $1, $3); } | expr '/' exp { $$ = newast('/', $1, $3); } | '|' exp { $$ = newast('|', $2, NULL); } | '(' exp ')' { $$ = $2); } | '-' exp %prec UMINUS { $$ = newast('M', $2, NULL); } | NUMBER { $$ = newnum($1); } ;%%
  • 36. Explicitly Solution%left, %right, and %nonassoc declarations defining the order of precedence from lowest to highest.%left, left associative%right, right associative%nonaccoc, no associativityUMINUS, pseudo token standing fro unary minus%prec UMINUS, %prec tells bison to use the precedence of UMINUS for this rule.
  • 37. IF/THEN/ELSE conflictWhen Not to Use Precedence RulesIn expression grammars and to resolve the "dangling else" conflict in grammars for if/then/else language constructs, it is easy to understand.But in other situations, it can be extremely difficult to understand.stmt: IF '(' cond ')' stmt | IF '(' cond ')' stmt ELSE stmt | TERMINALcond: TERMINALAmbiguous!!!IF ( cond ) IF ( cond ) stmt ELSE stmtWhich one?IF ( cond ) { IF ( cond ) stmt } ELSE stmtIF ( cond ) { IF ( cond ) stmt ELSE stmt }
  • 38. Implicitly Solutionstmt :matched | unmatched ;matched :other_stmt | IF expr THEN matched ELSE matched ;unmatched : IF expr THEN stmt | IF expr THEN matched ELSE unmatched ;other_stmt: /* rules for other kinds of statement */...IF ( cond ) { IF ( cond ) stmt ELSE stmt }
  • 39. Explicitly Solution%nonassoc THEN%nonassoc ELSE%%stmt : IF expr THEN stmt | IF expr stmt ELSE stmt ;Equal to:%nonassoc LOWER_THAN_ELSE%nonassoc ELSE%%stmt : IF expr stmt %prec LOWER_THAN_ELSE | IF expr stmt ELSE stmt ;IF ( cond ) { IF ( cond ) stmt ELSE stmt }
  • 40. expectOccasionally you may have a grammar that has a few conflicts, you are confident that bison will resolve them the way you want, and it's too much hassle to rewrite the grammar to get rid of them.%expect N tells bison that your parser should have N shift/reduce conflicts.%expect-rr N to tell it how many reduce/reduce conflicts to expect.
  • 41. Common Bugs In Bison ProgramsInfinite Recursion%%xlist: xlist ‘X’ ;should be ==>%%xlist : 'X'| xlist 'X’ ;
  • 42. Common Bugs In Bison ProgramsInterchanging Precedence%token NUMBER%left PLUS%left MUL%%expr : expr PLUS expr %prec MUL | expr MUL expr %prec PLUS | NUMBER ;
  • 43. Lexical FeedbackParsers can sometimes feed information back to the lexer to handle otherwise difficult situations. E.g. syntax like this:message ( any characters )/* parser */%{ init parenstring = 0;}%...%%statement: MESSAGE { parenstring = 1; } '(' STRING ')';
  • 44. Lexical Feedback/* lexer */%{ extern int parenstring;%}%s PSTRING%%"message" return MESSAGE;"(" { if(parenstring) BEGIN PSTRING; return '('; }<PSTRING>[^)]* { yylval.svalue = strdup(yytext); BEGIN INITIAL; return STRING; }
  • 45. Structure of a Bison Specification ... definition section ... %% ... rules section ... %%... user subroutines ...
  • 46. User subroutines SectionThis section typically includes routines called from the actions.Nothing special.