7
Most read
11
Most read
15
Most read
BY
SANA MATEEN
INTRODUCTION TO REGULAR EXPRESSIONS
 It is a way of defining patterns.
 A notation for describing the strings produced by regular expression.
 The first application of regular expressions in computer system was in the text
editors ed and sed in the UNIX system.
 Perl provides very powerful and dynamic string manipulation based on the usage of
regular expressions.
 Pattern Match – searching for a specified pattern within string.
 For example:
 A sequence motif,
 Accession number of a sequence,
 Parse HTML,
 Validating user input.
 Regular Expression (regex) – how to make a pattern match.
HOW REGEX WORK
Regex
code
Perl
compiler
Input data (e.g. sequence file)
outputregex engine
Simple Patterns
 Place the regex between a pair of forward slashes ( / / ).
 try:
 #!/usr/bin/perl
 while (<STDIN>) {
 if (/abc/) {
 print “>> found ‘abc’ in $_n”;
 }
 }
 Save then run the program. Type something on the terminal then press return.
Ctrl+C to exit script.
 If you type anything containing ‘abc’ the print statement is returned.
STAGES
1. The characters
 | ( ) [ { ^ $ * + ? .
are meta characters with special meanings in regular expression. To
use metacharacters in regular expression without a special meaning being
attached, it must be escaped with a backslash. ] and } are also
metacharacters in some circumstances.
2. Apart from meta characters any single character in a regular expression
/cat/ matches the string cat.
3. The meta characters ^ and $ act as anchors:
^ -- matches the start of the line
$ -- matches the end of the line.
so regex /^cat/ matches the string cat only if it appears at the start of
the line.
/cat$/ matches only at the end of the line.
/^cat$/ matches the line which contains the string cat and /^$/
matches an empty line.
4. The meta character dot (.) matches any single character except
newline, so/c.t/ matches cat,cot,cut, etc.
STAGES
5. A character class is set of characters enclosed in square brackets. Matches any
single character from those listed.
So /[aeiou]/- matches any vowel
/[0123456789]/-matches any digit
Or /[0-9]/
6. A character class of the form /[^....]/ matches any characters except those listed,
so /[^0-9]/ matches any non digit.
7. To remove the special meaning of minus to specify regular expression to match
arithmetic operators.
/[+-*/]/
8. Repetition of characters in regular expression can be specified by the
quantifiers
* -- zero or more occurrences
+ -- one or more occurrences
? – zero or more occurrences
9. Thus /[0-9]+/ matches an unsigned decimal number and /a.*b/ matches a substring
starting with ‘a’ and ending with ‘b’, with an indefinite number of other characters
in between.
FACILITIES
1. Alternations |
If RE1,RE2,RE3 are regular expressions, RE1|RE2|RE3 will match any one of the
components.
2. Grouping- ( )
Round Brackets can be used to group items.
/pitt the (elder|younger)/
3. Repetition counts
Explicit repetition counts can be added to a component of regular expression
/(wet[]){2}wet/ matches ‘ wet wet wet’
Full list of possible count modifiers are
{n} – must occur exactly n times
{n,} –must occur at least n times
{n,m}- must occur at least n times but no more than m times.
4. Regular expression
 Simple regex to check for an IP address:
 ^(?:[0-9]{1,3}.){3}[0-9]{1,3}$
FACILITIES
5. Non-greedy matching
A pattern including
.* matches the longest string it can find.
The pattern .*? Can be used when the shortest match is required.
? – shortest match
6.Short hand
This notation is given for frequent occurring character classes.
d – matches- digit
w – matches – word
s- matches- whitespace
D- matches any non digit character
Capitalization of notation reverses the sense
7. Anchors
b – word boundary
B – not a word boundary
/bJohn/ -matches both the target string John and Johnathan.
8. Back References
Round brackets define a series of partial matches that are remembered for use in subsequent processing or
in the RegEx itself.
9. The Match Operator
The match operator, m//, is used to match a string or statement to a regular expression. For example, to match
the character sequence "foo" against the scalar $bar, you might use a statement like this:
if ($bar =~ /foo/)
Note that the entire match expression.that is the expression on the left of =~ or !~ and the match operator,
returns true (in a scalar context) if the expression matches. Therefore the statement:
$true = ($foo =~ m/foo/);
BINDING OPERATOR
 Previous example matched against $_
 Want to match against a scalar variable?
 Binding Operator “=~” matches pattern on right against string on left.
 Usually add the m operator – clarity of code.
 $string =~ m/pattern/
MATCHING ONLY ONCE
 There is also a simpler version of the match operator - the ?PATTERN?
operator.
 This is basically identical to the m// operator except that it only matches once
within the string you are searching between each call to reset.
 For example, you can use this to get the first and last elements within a list:
 To remember which portion of string matched we use $1,$2,$3 etc
 #!/usr/bin/perl
 @list = qw/food foosball subeo footnote terfoot canic footbrdige/;
 foreach (@list) {
 $first = $1 if ?(foo.*)?; $last = $1 if /(foo.*)/;
 }
 print "First: $first, Last: $lastn";
 This will produce following result First: food, Last: footbrdige
s/PATTERN/REPLACEMENT/;
$string =~ s/dog/cat/;
#/user/bin/perl
$string = 'The cat sat on the mat';
$string =~ s/cat/dog/;
print "Final Result is $stringn";
This will produce following result
The dog sat on the mat
THE SUBSTITUTION OPERATOR
The substitution operator, s///, is really just an extension of the match operator that allows you to
replace the text matched with some new text. The basic form of the operator is:
The PATTERN is the regular expression for the text that we are looking for. The
REPLACEMENT is a specification for the text or regular expression that we want to use to
replace the found text with.
For example, we can replace all occurrences of .dog. with .cat. Using
Another example:
PATTERN MATCHING MODIFIERS
 m//i – Ignore case when pattern matching.
 m//g – Helps to count all occurrence of substring.
$count=0;
while($target =~ m/$substring/g) {
$count++
}
 m//m – treat a target string containing newline characters as multiple
lines.
 m//s –Treat a target string containing new line characters as single string, i.e
dot matches any character including newline.
 m//x – Ignore whitespace characters in the regular expression unless
they occur in character class.
 m//o – Compile regular expressions once only
THE TRANSLATION OPERATOR
 Translation is similar, but not identical, to the principles of substitution, but
unlike substitution, translation (or transliteration) does not use regular
expressions for its search on replacement values. The translation operators
are −
 tr/SEARCHLIST/REPLACEMENTLIST/cds
y/SEARCHLIST/REPLACEMENTLIST/cds
 The translation replaces all occurrences of the characters in SEARCHLIST
with the corresponding characters in REPLACEMENTLIST.
 For example, using the "The cat sat on the mat." string
 #/user/bin/perl
 $string = 'The cat sat on the mat';
 $string =~ tr/a/o/;
 print "$stringn";
 When above program is executed, it produces the following result −
 The cot sot on the mot.
TRANSLATION OPERATOR MODIFIERS
 Standard Perl ranges can also be used, allowing you to specify ranges of characters
either by letter or numerical value.
 To change the case of the string, you might use the following syntax in place of
the uc function.
 $string =~ tr/a-z/A-Z/;
 Following is the list of operators related to translation.
Modifier Description
c Complements SEARCHLIST
d Deletes found but unreplaced
characters
s Squashes duplicate replaced
characters.
SPLIT
 Syntax of split
 split REGEX, STRING will split the STRING at every match of the REGEX.
 split REGEX, STRING, LIMIT where LIMIT is a positive number. This will
split the STRING at every match of the REGEX, but will stop after it found LIMIT-
1 matches. So the number of elements it returns will be LIMIT or less.
 split REGEX - If STRING is not given, splitting the content of $_, the default
variable of Perl at every match of the REGEX.
 split without any parameter will split the content of $_ using /s+/ as REGEX.
 Simple cases
 split returns a list of strings:
 use Data::Dumper qw(Dumper); # used to dump out the contents of any
variable during the running of a program
 my $str = "ab cd ef gh ij";
 my @words = split / /, $str;
 print Dumper @words;
 The output is:
 $VAR1 = [ 'ab', 'cd', 'ef', 'gh', 'ij' ];
Strings,patterns and regular expressions in perl

More Related Content

PPTX
Array,lists and hashes in perl
PPTX
Subroutines in perl
PPTX
Pointers in c++
PPT
Introduction to JavaScript
PPTX
Scalar expressions and control structures in perl
PPT
mysql-Tutorial with Query presentation.ppt
PPTX
Unit 1-scalar expressions and control structures
PPT
Oops concepts in php
Array,lists and hashes in perl
Subroutines in perl
Pointers in c++
Introduction to JavaScript
Scalar expressions and control structures in perl
mysql-Tutorial with Query presentation.ppt
Unit 1-scalar expressions and control structures
Oops concepts in php

What's hot (20)

ODP
Perl Introduction
PPTX
Database : Relational Data Model
PPT
10. XML in DBMS
PPTX
Introduction to PHP
PPT
PPTX
encapsulation, inheritance, overriding, overloading
PPTX
Unit 1-introduction to scripts
PPT
Dbms relational model
PPT
Java Servlets
PPTX
Data structure and algorithm
PPT
LPW: Beginners Perl
PPTX
Servlets
PPT
Visual basic
PPT
Js ppt
PPT
Collection Framework in java
PDF
Perl programming language
PPTX
Relational Data Model Introduction
PPT
Exception Handling in JAVA
PPT
Introduction to data structures and Algorithm
Perl Introduction
Database : Relational Data Model
10. XML in DBMS
Introduction to PHP
encapsulation, inheritance, overriding, overloading
Unit 1-introduction to scripts
Dbms relational model
Java Servlets
Data structure and algorithm
LPW: Beginners Perl
Servlets
Visual basic
Js ppt
Collection Framework in java
Perl programming language
Relational Data Model Introduction
Exception Handling in JAVA
Introduction to data structures and Algorithm
Ad

Viewers also liked (6)

PPTX
Uses for scripting languages,web scripting in perl
PPTX
Reading init param
PPTX
Jdbc in servlets
PPTX
Http request and http response
PPTX
Using cookies and sessions
PPTX
Jsp elements
Uses for scripting languages,web scripting in perl
Reading init param
Jdbc in servlets
Http request and http response
Using cookies and sessions
Jsp elements
Ad

Similar to Strings,patterns and regular expressions in perl (20)

PDF
Working with text, Regular expressions
PDF
Basta mastering regex power
PPT
Bioinformatica 06-10-2011-p2 introduction
PDF
Lecture 23
PPTX
Bioinformatics p2-p3-perl-regexes v2014
PPT
regex.ppt
PPTX
Bioinformatics p2-p3-perl-regexes v2013-wim_vancriekinge
PPTX
Chapter 3: Introduction to Regular Expression
PDF
Perl Programming - 02 Regular Expression
PDF
Tutorial on Regular Expression in Perl (perldoc Perlretut)
PPT
Perl Intro 5 Regex Matches And Substitutions
PDF
Perl_Part4
PPTX
Regular Expressions
PPTX
Unit 1-array,lists and hashes
PPT
Regular Expressions grep and egrep
PPTX
NLP_KASHK:Regular Expressions
PPT
Regular Expressions in PHP, MySQL by programmerblog.net
KEY
Regular Expressions 101
PDF
Regular expression for everyone
PPTX
Regular expressions
Working with text, Regular expressions
Basta mastering regex power
Bioinformatica 06-10-2011-p2 introduction
Lecture 23
Bioinformatics p2-p3-perl-regexes v2014
regex.ppt
Bioinformatics p2-p3-perl-regexes v2013-wim_vancriekinge
Chapter 3: Introduction to Regular Expression
Perl Programming - 02 Regular Expression
Tutorial on Regular Expression in Perl (perldoc Perlretut)
Perl Intro 5 Regex Matches And Substitutions
Perl_Part4
Regular Expressions
Unit 1-array,lists and hashes
Regular Expressions grep and egrep
NLP_KASHK:Regular Expressions
Regular Expressions in PHP, MySQL by programmerblog.net
Regular Expressions 101
Regular expression for everyone
Regular expressions

More from sana mateen (20)

PPTX
PPTX
PHP Variables and scopes
PPTX
Php intro
PPTX
Php and web forms
PPTX
PPTX
Files in php
PPTX
File upload php
PPTX
Regex posix
PPTX
Encryption in php
PPTX
Authentication methods
PPTX
Xml schema
PPTX
Xml dtd
PPTX
Xml dom
PPTX
PPTX
Intro xml
PPTX
Dom parser
PPTX
Unit 1-subroutines in perl
PPTX
Unit 1-uses for scripting languages,web scripting
PPTX
Unit 1-strings,patterns and regular expressions
PPTX
Unit 1-perl names values and variables
PHP Variables and scopes
Php intro
Php and web forms
Files in php
File upload php
Regex posix
Encryption in php
Authentication methods
Xml schema
Xml dtd
Xml dom
Intro xml
Dom parser
Unit 1-subroutines in perl
Unit 1-uses for scripting languages,web scripting
Unit 1-strings,patterns and regular expressions
Unit 1-perl names values and variables

Recently uploaded (20)

PPT
Basics Of Pump types, Details, and working principles.
PPTX
Software-Development-Life-Cycle-SDLC.pptx
PPTX
Research Writing, Mechanical Engineering
PDF
IAE-V2500 Engine Airbus Family A319/320
PPTX
sub station Simple Design of Substation PPT.pptx
PDF
Software defined netwoks is useful to learn NFV and virtual Lans
PPTX
DATA STRCUTURE LABORATORY -BCSL305(PRG1)
PDF
BTCVPE506F_Module 1 History & Theories of Town Planning.pdf
PPTX
Soft Skills Unit 2 Listening Speaking Reading Writing.pptx
PDF
IAE-V2500 Engine for Airbus Family 319/320
PPTX
Hardware, SLAM tracking,Privacy and AR Cloud Data.
PPTX
22ME926Introduction to Business Intelligence and Analytics, Advanced Integrat...
PPTX
240409 Data Center Training Programs by Uptime Institute (Drafting).pptx
PDF
Recent Trends in Network Security - 2025
PPTX
Design ,Art Across Digital Realities and eXtended Reality
PDF
LS-6-Digital-Literacy (1) K12 CURRICULUM .pdf
PDF
The Journal of Finance - July 1993 - JENSEN - The Modern Industrial Revolutio...
PDF
Introduction to Machine Learning -Basic concepts,Models and Description
PPTX
1. Effective HSEW Induction Training - EMCO 2024, O&M.pptx
PDF
ASPEN PLUS USER GUIDE - PROCESS SIMULATIONS
Basics Of Pump types, Details, and working principles.
Software-Development-Life-Cycle-SDLC.pptx
Research Writing, Mechanical Engineering
IAE-V2500 Engine Airbus Family A319/320
sub station Simple Design of Substation PPT.pptx
Software defined netwoks is useful to learn NFV and virtual Lans
DATA STRCUTURE LABORATORY -BCSL305(PRG1)
BTCVPE506F_Module 1 History & Theories of Town Planning.pdf
Soft Skills Unit 2 Listening Speaking Reading Writing.pptx
IAE-V2500 Engine for Airbus Family 319/320
Hardware, SLAM tracking,Privacy and AR Cloud Data.
22ME926Introduction to Business Intelligence and Analytics, Advanced Integrat...
240409 Data Center Training Programs by Uptime Institute (Drafting).pptx
Recent Trends in Network Security - 2025
Design ,Art Across Digital Realities and eXtended Reality
LS-6-Digital-Literacy (1) K12 CURRICULUM .pdf
The Journal of Finance - July 1993 - JENSEN - The Modern Industrial Revolutio...
Introduction to Machine Learning -Basic concepts,Models and Description
1. Effective HSEW Induction Training - EMCO 2024, O&M.pptx
ASPEN PLUS USER GUIDE - PROCESS SIMULATIONS

Strings,patterns and regular expressions in perl

  • 2. INTRODUCTION TO REGULAR EXPRESSIONS  It is a way of defining patterns.  A notation for describing the strings produced by regular expression.  The first application of regular expressions in computer system was in the text editors ed and sed in the UNIX system.  Perl provides very powerful and dynamic string manipulation based on the usage of regular expressions.  Pattern Match – searching for a specified pattern within string.  For example:  A sequence motif,  Accession number of a sequence,  Parse HTML,  Validating user input.  Regular Expression (regex) – how to make a pattern match.
  • 3. HOW REGEX WORK Regex code Perl compiler Input data (e.g. sequence file) outputregex engine
  • 4. Simple Patterns  Place the regex between a pair of forward slashes ( / / ).  try:  #!/usr/bin/perl  while (<STDIN>) {  if (/abc/) {  print “>> found ‘abc’ in $_n”;  }  }  Save then run the program. Type something on the terminal then press return. Ctrl+C to exit script.  If you type anything containing ‘abc’ the print statement is returned.
  • 5. STAGES 1. The characters | ( ) [ { ^ $ * + ? . are meta characters with special meanings in regular expression. To use metacharacters in regular expression without a special meaning being attached, it must be escaped with a backslash. ] and } are also metacharacters in some circumstances. 2. Apart from meta characters any single character in a regular expression /cat/ matches the string cat. 3. The meta characters ^ and $ act as anchors: ^ -- matches the start of the line $ -- matches the end of the line. so regex /^cat/ matches the string cat only if it appears at the start of the line. /cat$/ matches only at the end of the line. /^cat$/ matches the line which contains the string cat and /^$/ matches an empty line. 4. The meta character dot (.) matches any single character except newline, so/c.t/ matches cat,cot,cut, etc.
  • 6. STAGES 5. A character class is set of characters enclosed in square brackets. Matches any single character from those listed. So /[aeiou]/- matches any vowel /[0123456789]/-matches any digit Or /[0-9]/ 6. A character class of the form /[^....]/ matches any characters except those listed, so /[^0-9]/ matches any non digit. 7. To remove the special meaning of minus to specify regular expression to match arithmetic operators. /[+-*/]/ 8. Repetition of characters in regular expression can be specified by the quantifiers * -- zero or more occurrences + -- one or more occurrences ? – zero or more occurrences 9. Thus /[0-9]+/ matches an unsigned decimal number and /a.*b/ matches a substring starting with ‘a’ and ending with ‘b’, with an indefinite number of other characters in between.
  • 7. FACILITIES 1. Alternations | If RE1,RE2,RE3 are regular expressions, RE1|RE2|RE3 will match any one of the components. 2. Grouping- ( ) Round Brackets can be used to group items. /pitt the (elder|younger)/ 3. Repetition counts Explicit repetition counts can be added to a component of regular expression /(wet[]){2}wet/ matches ‘ wet wet wet’ Full list of possible count modifiers are {n} – must occur exactly n times {n,} –must occur at least n times {n,m}- must occur at least n times but no more than m times. 4. Regular expression  Simple regex to check for an IP address:  ^(?:[0-9]{1,3}.){3}[0-9]{1,3}$
  • 8. FACILITIES 5. Non-greedy matching A pattern including .* matches the longest string it can find. The pattern .*? Can be used when the shortest match is required. ? – shortest match 6.Short hand This notation is given for frequent occurring character classes. d – matches- digit w – matches – word s- matches- whitespace D- matches any non digit character Capitalization of notation reverses the sense 7. Anchors b – word boundary B – not a word boundary /bJohn/ -matches both the target string John and Johnathan. 8. Back References Round brackets define a series of partial matches that are remembered for use in subsequent processing or in the RegEx itself. 9. The Match Operator The match operator, m//, is used to match a string or statement to a regular expression. For example, to match the character sequence "foo" against the scalar $bar, you might use a statement like this: if ($bar =~ /foo/) Note that the entire match expression.that is the expression on the left of =~ or !~ and the match operator, returns true (in a scalar context) if the expression matches. Therefore the statement: $true = ($foo =~ m/foo/);
  • 9. BINDING OPERATOR  Previous example matched against $_  Want to match against a scalar variable?  Binding Operator “=~” matches pattern on right against string on left.  Usually add the m operator – clarity of code.  $string =~ m/pattern/
  • 10. MATCHING ONLY ONCE  There is also a simpler version of the match operator - the ?PATTERN? operator.  This is basically identical to the m// operator except that it only matches once within the string you are searching between each call to reset.  For example, you can use this to get the first and last elements within a list:  To remember which portion of string matched we use $1,$2,$3 etc  #!/usr/bin/perl  @list = qw/food foosball subeo footnote terfoot canic footbrdige/;  foreach (@list) {  $first = $1 if ?(foo.*)?; $last = $1 if /(foo.*)/;  }  print "First: $first, Last: $lastn";  This will produce following result First: food, Last: footbrdige
  • 11. s/PATTERN/REPLACEMENT/; $string =~ s/dog/cat/; #/user/bin/perl $string = 'The cat sat on the mat'; $string =~ s/cat/dog/; print "Final Result is $stringn"; This will produce following result The dog sat on the mat THE SUBSTITUTION OPERATOR The substitution operator, s///, is really just an extension of the match operator that allows you to replace the text matched with some new text. The basic form of the operator is: The PATTERN is the regular expression for the text that we are looking for. The REPLACEMENT is a specification for the text or regular expression that we want to use to replace the found text with. For example, we can replace all occurrences of .dog. with .cat. Using Another example:
  • 12. PATTERN MATCHING MODIFIERS  m//i – Ignore case when pattern matching.  m//g – Helps to count all occurrence of substring. $count=0; while($target =~ m/$substring/g) { $count++ }  m//m – treat a target string containing newline characters as multiple lines.  m//s –Treat a target string containing new line characters as single string, i.e dot matches any character including newline.  m//x – Ignore whitespace characters in the regular expression unless they occur in character class.  m//o – Compile regular expressions once only
  • 13. THE TRANSLATION OPERATOR  Translation is similar, but not identical, to the principles of substitution, but unlike substitution, translation (or transliteration) does not use regular expressions for its search on replacement values. The translation operators are −  tr/SEARCHLIST/REPLACEMENTLIST/cds y/SEARCHLIST/REPLACEMENTLIST/cds  The translation replaces all occurrences of the characters in SEARCHLIST with the corresponding characters in REPLACEMENTLIST.  For example, using the "The cat sat on the mat." string  #/user/bin/perl  $string = 'The cat sat on the mat';  $string =~ tr/a/o/;  print "$stringn";  When above program is executed, it produces the following result −  The cot sot on the mot.
  • 14. TRANSLATION OPERATOR MODIFIERS  Standard Perl ranges can also be used, allowing you to specify ranges of characters either by letter or numerical value.  To change the case of the string, you might use the following syntax in place of the uc function.  $string =~ tr/a-z/A-Z/;  Following is the list of operators related to translation. Modifier Description c Complements SEARCHLIST d Deletes found but unreplaced characters s Squashes duplicate replaced characters.
  • 15. SPLIT  Syntax of split  split REGEX, STRING will split the STRING at every match of the REGEX.  split REGEX, STRING, LIMIT where LIMIT is a positive number. This will split the STRING at every match of the REGEX, but will stop after it found LIMIT- 1 matches. So the number of elements it returns will be LIMIT or less.  split REGEX - If STRING is not given, splitting the content of $_, the default variable of Perl at every match of the REGEX.  split without any parameter will split the content of $_ using /s+/ as REGEX.  Simple cases  split returns a list of strings:  use Data::Dumper qw(Dumper); # used to dump out the contents of any variable during the running of a program  my $str = "ab cd ef gh ij";  my @words = split / /, $str;  print Dumper @words;  The output is:  $VAR1 = [ 'ab', 'cd', 'ef', 'gh', 'ij' ];