SlideShare a Scribd company logo
1
Lexical Analysis
2
Structure
of a
Compiler
Source Language
Target Language
Semantic Analyzer
Syntax Analyzer
Lexical Analyzer
Front
End
Code Optimizer
Target Code Generator
Back
End
Int. Code Generator
Intermediate Code
3
Today!
Source Language
Target Language
Semantic Analyzer
Syntax Analyzer
Lexical Analyzer
Front
End
Code Optimizer
Target Code Generator
Back
End
Int. Code Generator
Intermediate Code
4
The Role of Lexical Analyzer
The lexical analyzer is the first phase of a
compiler.The main task of lexical
Analyzer(Scanner) is to read a stream of
characters as an input and produce a sequence
of tokens that the parser(Syntax Analyzer) uses
for syntax analysis.
5
Read Character Token
Symbol
Table
Parser
Lexical
Analyzer
input
Push back
character
Get Next Token
The Role of Lexical Analyzer (cont’d)
6
The Role of Lexical Analyzer (cont’d)
For example,a lexical analyzer for Pascal
must read ahead after it sees the character
>.If the next character is = ,then the
character sequence >= is the lexeme forming
the token for the “Greater than or equal to ”
operator.Other wise > is the lexeme forming
“Greater than ” operator ,and the lexical
analyzer has read one character too many.
7
The Role of Lexical Analyzer (cont’d)
The extra character has to be pushed back
on to the input,because it can be the
beginning of the next lexeme in the input.The
lexical analyzer and parser form a producer-
Consumer pair. The lexical analyzer
produces tokens and the parser consumes
them.Produced tokens can be held in a token
buffer until they are consumed.
8
The Role of Lexical Analyzer (cont’d)
The interaction b/w the two is constrained only by the
size of the buffer,because the lexical analyzer can
not proceed when the buffer is full and the parser
can not proceed when the buffer is empty.Commonly
,the buffer holds just one token.In this case,the
interaction can be implemented simply by making
the lexical analyzer be a procedure called by the
parser,returning tokens on demand.
9
The Role of Lexical Analyzer (cont’d)
The implementation of reading and pushing
back character is usually done by setting up
an input buffer.A block of character is read
into the buffer at a time; a pointer keeps track
of the portion of the input that has been
analyzed.Pushing back a character is
implemented by moving the pointer.
10
The Role of Lexical Analyzer (cont’d)
Some times lexical analyzer are divided into
two phases,the first is called Scanning and
the second is called Lexical Analysis.The
scanning is responsible for doing simple
tasks ,while the lexical analyzer does the
more complex operations.For example, a
Fortran might use a scanner to eliminate
blanks from the input.
11
Issues in Lexical Analysis
There are several reasons for separating the
analysis phase of compiling into lexical analysis
and parsing.
 Simpler design is perhaps the most important
consideration.The separation of lexical analysis
from syntax analysis often allows us to simply one
or the other of these phases.For example,a parser
embodying the conventions for comments and
white space is significantly more complex---
12
Issues in Lexical Analysis (cont’d)
--then one that can assume comments and
white space have already been removed by
a lexical analyzer.If we are designing a new
language,separating the lexical and
syntactic conventions can lead to a cleaner
over all language design.
13
Issues in Lexical Analysis (cont’d)
2) Compiler efficiency is improved.
A separate lexical analyzer allows us to construct a
specialized and potentially more efficient processor
for the task.A large amount of time is spent reading
the source program and partitioning it into
tokens.Specialized buffering techniques for
reading in put characters and processing tokens
can significantly speed up the performance of a
compiler.
14
Issues in Lexical Analysis (cont’d)
3) Compiler portability is enhanced.
Input alphabet peculiarities and other
device-specific anomalies can be restricted
to the lexical analyzer.The representation
of special or non- standard symbols,such
as ↑ in Pascal ,can be isolated in the lexical
analyzer.
15
Issues in Lexical Analysis (cont’d)
Specialized tools have been designed to help
automate the construction of lexical
analyzers and parsers when they are
separated.
16
What exactly is lexing?
Consider the code:
if (i==j);
z=1;
else;
z=0;
endif;
This is really nothing more than a string of characters:
if_( i==j);ntz=1;nels e;ntz=0;nendif;
During our lexical analysis phase we must divide this string into meaningful sub-
strings.
17
Tokens
 The output of our lexical analysis phase is a streams of tokens.
 A token is a syntactic category.
 In English this would be types of words or punctuation, such as
a “noun”, “verb”, “adjective” or “end-mark”.
 In a program, this could be an “identifier”, a “floating-point
number”, a “math symbol”, a “keyword”, etc…
18
Identifying Tokens
 A sub-string that represents an instance of a token is called a
lexeme.
 The class of all possible lexemes in a token is described by the
use of a pattern.
 For example, the pattern to describe an identifier (a variable) is
a string of letters, numbers, or underscores, beginning with a
non-number.
 Patterns are typically described using regular expressions.
19
Implementation
A lexical analyzer must be able to do three things:
1. Remove all whitespace and comments.
2. Identify tokens within a string.
3. Return the lexeme of a found token, as well as the
line number it was found on.
20
Example
if_ ( i= = j);ntz=1;nels e;ntz=0 ;nendif;
 Line Token Lexeme
 1 BLOCK_COMMAND if
 1 OPEN_PAREN (
 1 ID i
 1 OP_RELATION ==
 1 ID j
 1 CLOSE_PAREN )
 1 ENDLINE ;
 2 ID z
 2 ASSIGN =
 2 NUMBER 1
 2 ENDLINE ;
 3 BLOCK_COMMAND else
 Etc…
21
Lookahead
 Lookahead will typically be important to a lexical analyzer.
 Tokens are typically read in from left-to-right, recognized one at
a time from the input string.
 It is not always possible to instantly decide if a token is finished
without looking ahead at the next character. For example…
 Is “i” a variable, or the first character of “if”?
 Is “=” an assignment or the beginning of “==”?
22
TOKENS
The output of our lexical analysis phase is a streams of tokens.A
token is a syntactic category.
“A name for a set of input strings with related structure”
Example: “ID,” “NUM”, “RELATION“,”IF”
In English this would be types of words or punctuation, such as
a “noun”, “verb”, “adjective” or “end-mark”.
In a program, this could be an “identifier”, a “floating-point
number”, a “math symbol”, a “keyword”, etc…
23
Tokens (cont’d)
As an example,consider the following line of
code,which could be part of a “ C ” program.
a [index] = 4 + 2
24
Tokens (cont’d)
a ID
[ Left bracket
index ID
] Right bracket
= Assign
4 Num
+ plus sign
2 Num
25
Attributes For Tokens (cont’d)
< ID , pointer to symbol-table entry for E >
< assign_op , >
< ID , pointer to symbol entry for M >
< add_op , >
< ID , pointer to symbol entry for C >
< mult_op , >
< num ,integer value 2 >
26
THANKS

More Related Content

Similar to Data design and analysis of computing tools (20)

DOCX
Compiler Design
Anujashejwal
 
PDF
role of lexical parser compiler design1-181124035217.pdf
ranjan317165
 
PDF
11700220036.pdf
SouvikRoy149
 
PPTX
"Lexical Analysis for GATE and CS Exams"
KRISHNAVENISITTeachi
 
PPTX
Compiler Design.pptx
SouvikRoy149
 
PPTX
A Role of Lexical Analyzer
Archana Gopinath
 
PPT
Lecturer-05 lex anylser (1).pptrjyghsgst
engrsheikhmuhammadha
 
PDF
Structure of a Compiler, Compiler and Interpreter, Lexical Analysis: Role of ...
GunjalSanjay
 
PDF
Assignment4
Sunita Milind Dol
 
PPTX
Valuable Information on Lexical Analysis in Compiler Design
Lesa Cote
 
PDF
Lexical analysis - Compiler Design
Kuppusamy P
 
PPTX
ashjhas sahdj ajshbas sajakj askk sadk as
tahsanahmmedturjo727
 
PPTX
LexicalAnalysis chapter2 i n compiler design.pptx
Padamata Rameshbabu
 
PPTX
Ch03-LexicalAnalysis chapter2 in compiler design.pptx
Padamata Rameshbabu
 
PPT
SS & CD Module 3
ShwetaNirmanik
 
PPT
Module 2
ShwetaNirmanik
 
PPT
atc 3rd module compiler and automata.ppt
ranjan317165
 
PDF
3a. Context Free Grammar.pdf
TANZINTANZINA
 
PPTX
Plc part 2
Taymoor Nazmy
 
PPTX
Lexical Analysis - Compiler Design
Akhil Kaushik
 
Compiler Design
Anujashejwal
 
role of lexical parser compiler design1-181124035217.pdf
ranjan317165
 
11700220036.pdf
SouvikRoy149
 
"Lexical Analysis for GATE and CS Exams"
KRISHNAVENISITTeachi
 
Compiler Design.pptx
SouvikRoy149
 
A Role of Lexical Analyzer
Archana Gopinath
 
Lecturer-05 lex anylser (1).pptrjyghsgst
engrsheikhmuhammadha
 
Structure of a Compiler, Compiler and Interpreter, Lexical Analysis: Role of ...
GunjalSanjay
 
Assignment4
Sunita Milind Dol
 
Valuable Information on Lexical Analysis in Compiler Design
Lesa Cote
 
Lexical analysis - Compiler Design
Kuppusamy P
 
ashjhas sahdj ajshbas sajakj askk sadk as
tahsanahmmedturjo727
 
LexicalAnalysis chapter2 i n compiler design.pptx
Padamata Rameshbabu
 
Ch03-LexicalAnalysis chapter2 in compiler design.pptx
Padamata Rameshbabu
 
SS & CD Module 3
ShwetaNirmanik
 
Module 2
ShwetaNirmanik
 
atc 3rd module compiler and automata.ppt
ranjan317165
 
3a. Context Free Grammar.pdf
TANZINTANZINA
 
Plc part 2
Taymoor Nazmy
 
Lexical Analysis - Compiler Design
Akhil Kaushik
 

More from KamranAli649587 (20)

PPT
sect7--ch9--legal_priv_ethical_issues.ppt
KamranAli649587
 
PPTX
bba system analysis and algorithm development.pptx
KamranAli649587
 
PPTX
assignment and database algorithmno 4.pptx
KamranAli649587
 
PPT
Data structure and algorithm.lect-03.ppt
KamranAli649587
 
PDF
Ict in healthcare and well being use how it will be benefit
KamranAli649587
 
PPTX
Angular is and php fram work as idea of secience
KamranAli649587
 
PDF
UNIT-2-liang-barsky-clipping-algorithm-KM.pdf
KamranAli649587
 
PPTX
graphs data structure and algorithm link list
KamranAli649587
 
PPT
lecture10 date structure types of graph and terminology
KamranAli649587
 
PPTX
Encoder-and-decoder.pptx
KamranAli649587
 
PPTX
Radio propagation model...pptx
KamranAli649587
 
PPT
Loops_and_FunctionsWeek4_0.ppt
KamranAli649587
 
PPT
Lecture+06-TypesVars.ppt
KamranAli649587
 
PPT
C++InputOutput.PPT
KamranAli649587
 
PDF
radiopropagation-140328202308-phpapp01.pdf
KamranAli649587
 
PPTX
cluster.pptx
KamranAli649587
 
PPT
Week11-EvaluationMethods.ppt
KamranAli649587
 
PPT
Week6-Sectionsofapaper.ppt
KamranAli649587
 
PDF
null-13.pdf
KamranAli649587
 
PDF
Reaches
KamranAli649587
 
sect7--ch9--legal_priv_ethical_issues.ppt
KamranAli649587
 
bba system analysis and algorithm development.pptx
KamranAli649587
 
assignment and database algorithmno 4.pptx
KamranAli649587
 
Data structure and algorithm.lect-03.ppt
KamranAli649587
 
Ict in healthcare and well being use how it will be benefit
KamranAli649587
 
Angular is and php fram work as idea of secience
KamranAli649587
 
UNIT-2-liang-barsky-clipping-algorithm-KM.pdf
KamranAli649587
 
graphs data structure and algorithm link list
KamranAli649587
 
lecture10 date structure types of graph and terminology
KamranAli649587
 
Encoder-and-decoder.pptx
KamranAli649587
 
Radio propagation model...pptx
KamranAli649587
 
Loops_and_FunctionsWeek4_0.ppt
KamranAli649587
 
Lecture+06-TypesVars.ppt
KamranAli649587
 
C++InputOutput.PPT
KamranAli649587
 
radiopropagation-140328202308-phpapp01.pdf
KamranAli649587
 
cluster.pptx
KamranAli649587
 
Week11-EvaluationMethods.ppt
KamranAli649587
 
Week6-Sectionsofapaper.ppt
KamranAli649587
 
null-13.pdf
KamranAli649587
 
Ad

Recently uploaded (20)

PDF
Ethics and Trustworthy AI in Healthcare – Governing Sensitive Data, Profiling...
AlqualsaDIResearchGr
 
PDF
Pressure Measurement training for engineers and Technicians
AIESOLUTIONS
 
PPTX
Hashing Introduction , hash functions and techniques
sailajam21
 
PDF
Water Design_Manual_2005. KENYA FOR WASTER SUPPLY AND SEWERAGE
DancanNgutuku
 
PPTX
Solar Thermal Energy System Seminar.pptx
Gpc Purapuza
 
PPTX
Green Building & Energy Conservation ppt
Sagar Sarangi
 
PPTX
Product Development & DevelopmentLecture02.pptx
zeeshanwazir2
 
PPTX
原版一样(Acadia毕业证书)加拿大阿卡迪亚大学毕业证办理方法
Taqyea
 
PDF
Unified_Cloud_Comm_Presentation anil singh ppt
anilsingh298751
 
PPTX
Server Side Web Development Unit 1 of Nodejs.pptx
sneha852132
 
PPTX
Introduction to Neural Networks and Perceptron Learning Algorithm.pptx
Kayalvizhi A
 
PDF
International Journal of Information Technology Convergence and services (IJI...
ijitcsjournal4
 
PDF
Introduction to Productivity and Quality
মোঃ ফুরকান উদ্দিন জুয়েল
 
PPTX
Heart Bleed Bug - A case study (Course: Cryptography and Network Security)
Adri Jovin
 
PPTX
Introduction to Design of Machine Elements
PradeepKumarS27
 
PDF
MAD Unit - 2 Activity and Fragment Management in Android (Diploma IT)
JappanMavani
 
PDF
PORTFOLIO Golam Kibria Khan — architect with a passion for thoughtful design...
MasumKhan59
 
PDF
Design Thinking basics for Engineers.pdf
CMR University
 
PPTX
The Role of Information Technology in Environmental Protectio....pptx
nallamillisriram
 
PDF
Set Relation Function Practice session 24.05.2025.pdf
DrStephenStrange4
 
Ethics and Trustworthy AI in Healthcare – Governing Sensitive Data, Profiling...
AlqualsaDIResearchGr
 
Pressure Measurement training for engineers and Technicians
AIESOLUTIONS
 
Hashing Introduction , hash functions and techniques
sailajam21
 
Water Design_Manual_2005. KENYA FOR WASTER SUPPLY AND SEWERAGE
DancanNgutuku
 
Solar Thermal Energy System Seminar.pptx
Gpc Purapuza
 
Green Building & Energy Conservation ppt
Sagar Sarangi
 
Product Development & DevelopmentLecture02.pptx
zeeshanwazir2
 
原版一样(Acadia毕业证书)加拿大阿卡迪亚大学毕业证办理方法
Taqyea
 
Unified_Cloud_Comm_Presentation anil singh ppt
anilsingh298751
 
Server Side Web Development Unit 1 of Nodejs.pptx
sneha852132
 
Introduction to Neural Networks and Perceptron Learning Algorithm.pptx
Kayalvizhi A
 
International Journal of Information Technology Convergence and services (IJI...
ijitcsjournal4
 
Introduction to Productivity and Quality
মোঃ ফুরকান উদ্দিন জুয়েল
 
Heart Bleed Bug - A case study (Course: Cryptography and Network Security)
Adri Jovin
 
Introduction to Design of Machine Elements
PradeepKumarS27
 
MAD Unit - 2 Activity and Fragment Management in Android (Diploma IT)
JappanMavani
 
PORTFOLIO Golam Kibria Khan — architect with a passion for thoughtful design...
MasumKhan59
 
Design Thinking basics for Engineers.pdf
CMR University
 
The Role of Information Technology in Environmental Protectio....pptx
nallamillisriram
 
Set Relation Function Practice session 24.05.2025.pdf
DrStephenStrange4
 
Ad

Data design and analysis of computing tools

  • 2. 2 Structure of a Compiler Source Language Target Language Semantic Analyzer Syntax Analyzer Lexical Analyzer Front End Code Optimizer Target Code Generator Back End Int. Code Generator Intermediate Code
  • 3. 3 Today! Source Language Target Language Semantic Analyzer Syntax Analyzer Lexical Analyzer Front End Code Optimizer Target Code Generator Back End Int. Code Generator Intermediate Code
  • 4. 4 The Role of Lexical Analyzer The lexical analyzer is the first phase of a compiler.The main task of lexical Analyzer(Scanner) is to read a stream of characters as an input and produce a sequence of tokens that the parser(Syntax Analyzer) uses for syntax analysis.
  • 5. 5 Read Character Token Symbol Table Parser Lexical Analyzer input Push back character Get Next Token The Role of Lexical Analyzer (cont’d)
  • 6. 6 The Role of Lexical Analyzer (cont’d) For example,a lexical analyzer for Pascal must read ahead after it sees the character >.If the next character is = ,then the character sequence >= is the lexeme forming the token for the “Greater than or equal to ” operator.Other wise > is the lexeme forming “Greater than ” operator ,and the lexical analyzer has read one character too many.
  • 7. 7 The Role of Lexical Analyzer (cont’d) The extra character has to be pushed back on to the input,because it can be the beginning of the next lexeme in the input.The lexical analyzer and parser form a producer- Consumer pair. The lexical analyzer produces tokens and the parser consumes them.Produced tokens can be held in a token buffer until they are consumed.
  • 8. 8 The Role of Lexical Analyzer (cont’d) The interaction b/w the two is constrained only by the size of the buffer,because the lexical analyzer can not proceed when the buffer is full and the parser can not proceed when the buffer is empty.Commonly ,the buffer holds just one token.In this case,the interaction can be implemented simply by making the lexical analyzer be a procedure called by the parser,returning tokens on demand.
  • 9. 9 The Role of Lexical Analyzer (cont’d) The implementation of reading and pushing back character is usually done by setting up an input buffer.A block of character is read into the buffer at a time; a pointer keeps track of the portion of the input that has been analyzed.Pushing back a character is implemented by moving the pointer.
  • 10. 10 The Role of Lexical Analyzer (cont’d) Some times lexical analyzer are divided into two phases,the first is called Scanning and the second is called Lexical Analysis.The scanning is responsible for doing simple tasks ,while the lexical analyzer does the more complex operations.For example, a Fortran might use a scanner to eliminate blanks from the input.
  • 11. 11 Issues in Lexical Analysis There are several reasons for separating the analysis phase of compiling into lexical analysis and parsing.  Simpler design is perhaps the most important consideration.The separation of lexical analysis from syntax analysis often allows us to simply one or the other of these phases.For example,a parser embodying the conventions for comments and white space is significantly more complex---
  • 12. 12 Issues in Lexical Analysis (cont’d) --then one that can assume comments and white space have already been removed by a lexical analyzer.If we are designing a new language,separating the lexical and syntactic conventions can lead to a cleaner over all language design.
  • 13. 13 Issues in Lexical Analysis (cont’d) 2) Compiler efficiency is improved. A separate lexical analyzer allows us to construct a specialized and potentially more efficient processor for the task.A large amount of time is spent reading the source program and partitioning it into tokens.Specialized buffering techniques for reading in put characters and processing tokens can significantly speed up the performance of a compiler.
  • 14. 14 Issues in Lexical Analysis (cont’d) 3) Compiler portability is enhanced. Input alphabet peculiarities and other device-specific anomalies can be restricted to the lexical analyzer.The representation of special or non- standard symbols,such as ↑ in Pascal ,can be isolated in the lexical analyzer.
  • 15. 15 Issues in Lexical Analysis (cont’d) Specialized tools have been designed to help automate the construction of lexical analyzers and parsers when they are separated.
  • 16. 16 What exactly is lexing? Consider the code: if (i==j); z=1; else; z=0; endif; This is really nothing more than a string of characters: if_( i==j);ntz=1;nels e;ntz=0;nendif; During our lexical analysis phase we must divide this string into meaningful sub- strings.
  • 17. 17 Tokens  The output of our lexical analysis phase is a streams of tokens.  A token is a syntactic category.  In English this would be types of words or punctuation, such as a “noun”, “verb”, “adjective” or “end-mark”.  In a program, this could be an “identifier”, a “floating-point number”, a “math symbol”, a “keyword”, etc…
  • 18. 18 Identifying Tokens  A sub-string that represents an instance of a token is called a lexeme.  The class of all possible lexemes in a token is described by the use of a pattern.  For example, the pattern to describe an identifier (a variable) is a string of letters, numbers, or underscores, beginning with a non-number.  Patterns are typically described using regular expressions.
  • 19. 19 Implementation A lexical analyzer must be able to do three things: 1. Remove all whitespace and comments. 2. Identify tokens within a string. 3. Return the lexeme of a found token, as well as the line number it was found on.
  • 20. 20 Example if_ ( i= = j);ntz=1;nels e;ntz=0 ;nendif;  Line Token Lexeme  1 BLOCK_COMMAND if  1 OPEN_PAREN (  1 ID i  1 OP_RELATION ==  1 ID j  1 CLOSE_PAREN )  1 ENDLINE ;  2 ID z  2 ASSIGN =  2 NUMBER 1  2 ENDLINE ;  3 BLOCK_COMMAND else  Etc…
  • 21. 21 Lookahead  Lookahead will typically be important to a lexical analyzer.  Tokens are typically read in from left-to-right, recognized one at a time from the input string.  It is not always possible to instantly decide if a token is finished without looking ahead at the next character. For example…  Is “i” a variable, or the first character of “if”?  Is “=” an assignment or the beginning of “==”?
  • 22. 22 TOKENS The output of our lexical analysis phase is a streams of tokens.A token is a syntactic category. “A name for a set of input strings with related structure” Example: “ID,” “NUM”, “RELATION“,”IF” In English this would be types of words or punctuation, such as a “noun”, “verb”, “adjective” or “end-mark”. In a program, this could be an “identifier”, a “floating-point number”, a “math symbol”, a “keyword”, etc…
  • 23. 23 Tokens (cont’d) As an example,consider the following line of code,which could be part of a “ C ” program. a [index] = 4 + 2
  • 24. 24 Tokens (cont’d) a ID [ Left bracket index ID ] Right bracket = Assign 4 Num + plus sign 2 Num
  • 25. 25 Attributes For Tokens (cont’d) < ID , pointer to symbol-table entry for E > < assign_op , > < ID , pointer to symbol entry for M > < add_op , > < ID , pointer to symbol entry for C > < mult_op , > < num ,integer value 2 >