SlideShare a Scribd company logo
Regular Expressions in PHP/(?:dave@davidstockton\.com)/Front Range PHP User GroupDavid Stockton
What is a regular expression?A pattern used to describe a part of some text“Regular” has some implications to how it can be built, but that’s not really part of this presentationExtremely powerful and useful(And often abused)
Regex JokeA programmer says, “I have a problem that I can solve with regular expressions.”Now, he has two problems…
How to use regex in PHP	The preg_* functionsPerl compatible regular expressions.Probably the most common regex syntaxThe ereg_* functionsPOSIX style regular expressionsI am not covering these functions.Don’t use the ereg ones.  They are deprecated in PHP 5.3.
How can we use regex in PHP?preg_match( ) – Searches a subject for a matchpreg_match_all( ) – Searches a subject for all matchespreg_replace( ) – Searches a subject for a pattern and replaces it with something elsepreg_split( ) – Split a string into an array based on a regex delimiterpreg_filter( ) – Identical to preg_replace except it returns only the matchespreg_replace_callback( ) – Like preg_replace, but replacement is defined in a callbackpreg_grep( ) – Returns an array of array elements that match a pattern
How can we use regex in PHP?preg_quote( ) – Quotes regular expression characterspreg_last_error( ) – Returns the error code of the last PCRE (Perl Compatible Regular Expression) function execution
How can we use regex in PHP?Those are the function calls, and we’ll play with the later.First, we need to learn how to create regex patterns since we need those for any function call.
Starting Pattern/[A-Z0-9\._+=]+@[A-Z0-9\.-]\.[A-Z]{2,4}/iThis matches a series of letters, numbers, plus, dash, dots, underscores and equals, followed by an “AT” (@) sign, followed by a series of letters, numbers, dots and dashes, followed by a dot, followed by 2 to 4 letters.In other words…  It matches an email address…  Or rather some email addresses.
Matching Email AddressesWhat about james@smithsonian.museum?What about freddie@wherecanI.travel?Both of those are valid email addresses, but they fail because our patter only allows 2-4 character TLD parts for the email address.How can we match all valid email addresses and only valid email addresses?
The “real” email address regex(?:(?:\r\n)?[ \t])*(?:(?:(?:[^()<>@,;:\\".\[\] \000-\031]+(?:(?:(?:\r\n)?[ \t] )+|\Z|(?=[\["()<>@,;:\\".\[\]]))|"(?:[^\"\r\\]|\\.|(?:(?:\r\n)?[ \t]))*"(?:(?: \r\n)?[ \t])*)(?:\.(?:(?:\r\n)?[ \t])*(?:[^()<>@,;:\\".\[\] \000-\031]+(?:(?:( ?:\r\n)?[ \t])+|\Z|(?=[\["()<>@,;:\\".\[\]]))|"(?:[^\"\r\\]|\\.|(?:(?:\r\n)?[ \t]))*"(?:(?:\r\n)?[ \t])*))*@(?:(?:\r\n)?[ \t])*(?:[^()<>@,;:\\".\[\] \000-\0 31]+(?:(?:(?:\r\n)?[ \t])+|\Z|(?=[\["()<>@,;:\\".\[\]]))|\[([^\[\]\r\\]|\\.)*\ ](?:(?:\r\n)?[ \t])*)(?:\.(?:(?:\r\n)?[ \t])*(?:[^()<>@,;:\\".\[\] \000-\031]+ (?:(?:(?:\r\n)?[ \t])+|\Z|(?=[\["()<>@,;:\\".\[\]]))|\[([^\[\]\r\\]|\\.)*\](?: (?:\r\n)?[ \t])*))*|(?:[^()<>@,;:\\".\[\] \000-\031]+(?:(?:(?:\r\n)?[ \t])+|\Z |(?=[\["()<>@,;:\\".\[\]]))|"(?:[^\"\r\\]|\\.|(?:(?:\r\n)?[ \t]))*"(?:(?:\r\n) ?[ \t])*)*\<(?:(?:\r\n)?[ \t])*(?:@(?:[^()<>@,;:\\".\[\] \000-\031]+(?:(?:(?:\ r\n)?[ \t])+|\Z|(?=[\["()<>@,;:\\".\[\]]))|\[([^\[\]\r\\]|\\.)*\](?:(?:\r\n)?[ \t])*)(?:\.(?:(?:\r\n)?[ \t])*(?:[^()<>@,;:\\".\[\] \000-\031]+(?:(?:(?:\r\n) ?[ \t])+|\Z|(?=[\["()<>@,;:\\".\[\]]))|\[([^\[\]\r\\]|\\.)*\](?:(?:\r\n)?[ \t] )*))*(?:,@(?:(?:\r\n)?[ \t])*(?:[^()<>@,;:\\".\[\] \000-\031]+(?:(?:(?:\r\n)?[ \t])+|\Z|(?=[\["()<>@,;:\\".\[\]]))|\[([^\[\]\r\\]|\\.)*\](?:(?:\r\n)?[ \t])* )(?:\.(?:(?:\r\n)?[ \t])*(?:[^()<>@,;:\\".\[\] \000-\031]+(?:(?:(?:\r\n)?[ \t] )+|\Z|(?=[\["()<>@,;:\\".\[\]]))|\[([^\[\]\r\\]|\\.)*\](?:(?:\r\n)?[ \t])*))*) *:(?:(?:\r\n)?[ \t])*)?(?:[^()<>@,;:\\".\[\] \000-\031]+(?:(?:(?:\r\n)?[ \t])+ |\Z|(?=[\["()<>@,;:\\".\[\]]))|"(?:[^\"\r\\]|\\.|(?:(?:\r\n)?[ \t]))*"(?:(?:\r \n)?[ \t])*)(?:\.(?:(?:\r\n)?[ \t])*(?:[^()<>@,;:\\".\[\] \000-\031]+(?:(?:(?: \r\n)?[ \t])+|\Z|(?=[\["()<>@,;:\\".\[\]]))|"(?:[^\"\r\\]|\\.|(?:(?:\r\n)?[ \t ]))*"(?:(?:\r\n)?[ \t])*))*@(?:(?:\r\n)?[ \t])*(?:[^()<>@,;:\\".\[\] \000-\031 ]+(?:(?:(?:\r\n)?[ \t])+|\Z|(?=[\["()<>@,;:\\".\[\]]))|\[([^\[\]\r\\]|\\.)*\]( ?:(?:\r\n)?[ \t])*)(?:\.(?:(?:\r\n)?[ \t])*(?:[^()<>@,;:\\".\[\] \000-\031]+(? :(?:(?:\r\n)?[ \t])+|\Z|(?=[\["()<>@,;:\\".\[\]]))|\[([^\[\]\r\\]|\\.)*\](?:(? :\r\n)?[ \t])*))*\>(?:(?:\r\n)?[ \t])*)|(?:[^()<>@,;:\\".\[\] \000-\031]+(?:(? :(?:\r\n)?[ \t])+|\Z|(?=[\["()<>@,;:\\".\[\]]))|"(?:[^\"\r\\]|\\.|(?:(?:\r\n)? [ \t]))*"(?:(?:\r\n)?[ \t])*)*:(?:(?:\r\n)?[ \t])*(?:(?:(?:[^()<>@,;:\\".\[\] \000-\031]+(?:(?:(?:\r\n)?[ \t])+|\Z|(?=[\["()<>@,;:\\".\[\]]))|"(?:[^\"\r\\]| \\.|(?:(?:\r\n)?[ \t]))*"(?:(?:\r\n)?[ \t])*)(?:\.(?:(?:\r\n)?[ \t])*(?:[^()<> @,;:\\".\[\] \000-\031]+(?:(?:(?:\r\n)?[ \t])+|\Z|(?=[\["()<>@,;:\\".\[\]]))|" (?:[^\"\r\\]|\\.|(?:(?:\r\n)?[ \t]))*"(?:(?:\r\n)?[ \t])*))*@(?:(?:\r\n)?[ \t] )*(?:[^()<>@,;:\\".\[\] \000-\031]+(?:(?:(?:\r\n)?[ \t])+|\Z|(?=[\["()<>@,;:\\ ".\[\]]))|\[([^\[\]\r\\]|\\.)*\](?:(?:\r\n)?[ \t])*)(?:\.(?:(?:\r\n)?[ \t])*(? :[^()<>@,;:\\".\[\] \000-
The “real” email address regex cont.\031]+(?:(?:(?:\r\n)?[ \t])+|\Z|(?=[\["()<>@,;:\\".\[ \]]))|\[([^\[\]\r\\]|\\.)*\](?:(?:\r\n)?[ \t])*))*|(?:[^()<>@,;:\\".\[\] \000- \031]+(?:(?:(?:\r\n)?[ \t])+|\Z|(?=[\["()<>@,;:\\".\[\]]))|"(?:[^\"\r\\]|\\.|( ?:(?:\r\n)?[ \t]))*"(?:(?:\r\n)?[ \t])*)*\<(?:(?:\r\n)?[ \t])*(?:@(?:[^()<>@,; :\\".\[\] \000-\031]+(?:(?:(?:\r\n)?[ \t])+|\Z|(?=[\["()<>@,;:\\".\[\]]))|\[([ ^\[\]\r\\]|\\.)*\](?:(?:\r\n)?[ \t])*)(?:\.(?:(?:\r\n)?[ \t])*(?:[^()<>@,;:\\" .\[\] \000-\031]+(?:(?:(?:\r\n)?[ \t])+|\Z|(?=[\["()<>@,;:\\".\[\]]))|\[([^\[\ ]\r\\]|\\.)*\](?:(?:\r\n)?[ \t])*))*(?:,@(?:(?:\r\n)?[ \t])*(?:[^()<>@,;:\\".\ [\] \000-\031]+(?:(?:(?:\r\n)?[ \t])+|\Z|(?=[\["()<>@,;:\\".\[\]]))|\[([^\[\]\ r\\]|\\.)*\](?:(?:\r\n)?[ \t])*)(?:\.(?:(?:\r\n)?[ \t])*(?:[^()<>@,;:\\".\[\] \000-\031]+(?:(?:(?:\r\n)?[ \t])+|\Z|(?=[\["()<>@,;:\\".\[\]]))|\[([^\[\]\r\\] |\\.)*\](?:(?:\r\n)?[ \t])*))*)*:(?:(?:\r\n)?[ \t])*)?(?:[^()<>@,;:\\".\[\] \0 00-\031]+(?:(?:(?:\r\n)?[ \t])+|\Z|(?=[\["()<>@,;:\\".\[\]]))|"(?:[^\"\r\\]|\\ .|(?:(?:\r\n)?[ \t]))*"(?:(?:\r\n)?[ \t])*)(?:\.(?:(?:\r\n)?[ \t])*(?:[^()<>@, ;:\\".\[\] \000-\031]+(?:(?:(?:\r\n)?[ \t])+|\Z|(?=[\["()<>@,;:\\".\[\]]))|"(? :[^\"\r\\]|\\.|(?:(?:\r\n)?[ \t]))*"(?:(?:\r\n)?[ \t])*))*@(?:(?:\r\n)?[ \t])* (?:[^()<>@,;:\\".\[\] \000-\031]+(?:(?:(?:\r\n)?[ \t])+|\Z|(?=[\["()<>@,;:\\". \[\]]))|\[([^\[\]\r\\]|\\.)*\](?:(?:\r\n)?[ \t])*)(?:\.(?:(?:\r\n)?[ \t])*(?:[ ^()<>@,;:\\".\[\] \000-\031]+(?:(?:(?:\r\n)?[ \t])+|\Z|(?=[\["()<>@,;:\\".\[\] ]))|\[([^\[\]\r\\]|\\.)*\](?:(?:\r\n)?[ \t])*))*\>(?:(?:\r\n)?[ \t])*)(?:,\s*( ?:(?:[^()<>@,;:\\".\[\] \000-\031]+(?:(?:(?:\r\n)?[ \t])+|\Z|(?=[\["()<>@,;:\\ ".\[\]]))|"(?:[^\"\r\\]|\\.|(?:(?:\r\n)?[ \t]))*"(?:(?:\r\n)?[ \t])*)(?:\.(?:( ?:\r\n)?[ \t])*(?:[^()<>@,;:\\".\[\] \000-\031]+(?:(?:(?:\r\n)?[ \t])+|\Z|(?=[ \["()<>@,;:\\".\[\]]))|"(?:[^\"\r\\]|\\.|(?:(?:\r\n)?[ \t]))*"(?:(?:\r\n)?[ \t ])*))*@(?:(?:\r\n)?[ \t])*(?:[^()<>@,;:\\".\[\] \000-\031]+(?:(?:(?:\r\n)?[ \t ])+|\Z|(?=[\["()<>@,;:\\".\[\]]))|\[([^\[\]\r\\]|\\.)*\](?:(?:\r\n)?[ \t])*)(? :\.(?:(?:\r\n)?[ \t])*(?:[^()<>@,;:\\".\[\] \000-\031]+(?:(?:(?:\r\n)?[ \t])+| \Z|(?=[\["()<>@,;:\\".\[\]]))|\[([^\[\]\r\\]|\\.)*\](?:(?:\r\n)?[ \t])*))*|(?: [^()<>@,;:\\".\[\] \000-\031]+(?:(?:(?:\r\n)?[ \t])+|\Z|(?=[\["()<>@,;:\\".\[\ ]]))|"(?:[^\"\r\\]|\\.|(?:(?:\r\n)?[ \t]))*"(?:(?:\r\n)?[ \t])*)*\<(?:(?:\r\n) ?[ \t])*(?:@(?:[^()<>@,;:\\".\[\] \000-\031]+(?:(?:(?:\r\n)?[ \t])+|\Z|(?=[\[" ()<>@,;:\\".\[\]]))|\[([^\[\]\r\\]|\\.)*\](?:(?:\r\n)?[ \t])*)(?:\.(?:(?:\r\n) ?[ \t])*(?:[^()<>@,;:\\".\[\] \000-\031]+(?:(?:(?:\r\n)?[ \t])+|\Z|(?=[\["()<> @,;:\\".\[\]]))|\[([^\[\]\r\\]|\\.)*\](?:(?:\r\n)?[ \t])*))*(?:,@(?:(?:\r\n)?[ \t])*(?:[^()<>@,;:\\".\[\] \000-\031]+(?:(?:(?:\r\n)?[ \t])+|\Z|(?=[\["()<>@, ;:\\".\[\]]))|\[([^\[\]\r\\]|\\.)*\](?:(?:\r\n)?[ \t])*)(?:\.(?:(?:\r\n)?[ \t] )*(?:[^()<>@,;:\\".\[\] \000-\031]+(?:(?:(?:\r\n)?[ \t])+|\Z|(?=[\["()<>@,;:\\ ".\[\]]))|\[([^\[\]\r\\]|\\.)*\](?:(?:\r\n)?[ \t])*))*)*:(?:(?:\r\n)?[ \t])*)? (?:[^()<>@,;:\\".\[\] \000-\031]+(?:(?:(?:\r\n)?[ \t])+|\Z|(?=[\["()<>@,;:\\". \[\]]))|"(?:[^\"\r\\]|\\.|(?:(?:\r\n)?[ \t]))*"(?:(?:\r\n)?[ \t])*)(?:\.(?:(?: \r\n)?[ \t])*(?:[^()<>@,;:\\".\[\] \000-\031]+(?:(?:(?:\r\n)?[ \t])+|\Z|(?=[\[ "()<>@,;:\\".\[\]]))|"(?:[^\"\r\\]|\\.|(?:(?:\r\n)?[ \t]))*"(?:(?:\r\n)?[ \t]) *))*@(?:(?:\r\n)?[ \t])*(?:[^()<>@,;:\\".\[\] \000-\031]+(?:(?:(?:\r\n)?[ \t]) +|\Z|(?=[\["()<>@,;:\\".\[\]]))|\[([^\[\]\r\\]|\\.)*\](?:(?:\r\n)?[ \t])*)(?:\ .(?:(?:\r\n)?[ \t])*(?:[^()<>@,;:\\".\[\] \000-\031]+(?:(?:(?:\r\n)?[ \t])+|\Z |(?=[\["()<>@,;:\\".\[\]]))|\[([^\[\]\r\\]|\\.)*\](?:(?:\r\n)?[ \t])*))*\>(?:( ?:\r\n)?[ \t])*))*)?;\s*)
So…  How do we write this?Don’t.  Other much more simple patterns have been written and will match 99.9% of valid email addresses.Use something like Zend_Validate_EmailAddress
So now the real learnin’…Letters and numbers match…  letters and numbers/a/ - Matches a string that contains an “a”/7/ - Matches a string that contains a 7.
More learnin’Match a word/regex/  - Matches a string with the word “regex” in itYou can use a pipe character to give a choice/pizza|steak|cheeseburger/ - Matches a string with any of these foods
DelimitersThe examples so far have started with / and ended with /.These are delimiters and let the regex engine know where the pattern starts and ends.You can choose another delimiter if you’d like or if it’s more convenientMatch namespace:#/My/PHP/Namespace#If I used “/” in that example, I’d need to escape each of the forward slashes to differentiate them from the delimiter
Character Matching ContinuedYou can match a selection of characters/[Pp][Hh][Pp]/  - Matches PHP in any mixture of upper and lowercaseRanges can be defined/[abcdefghijklmnopqrstuvwxyz]/ - Matches any lowercase alpha character/[a-z]/ - Matches any lowercase alpha character
Character Selection RangesRanges can be combined/[A-Za-z0-9]/ - Matches an alphanumeric character/[A-Fa-f0-9]/ - Matches any hex characterCharacter Selection can be inversed/[^0-9]/ - Matches any non-digit character/[^ ]/  - Matches any non space character/[.!@#$%^*]/ - Matches some punctuation
Special CharactersDot (.) matches any character/.//../ - Matches any two charactersTo match an actual dot character, you must escape/\./ - Matches a single dot characterUnless it’s a character selection/[.]/ - Matches a single dot character
Character classes\d means [0-9]\D means non-digits  - [^0-9]\w means word characters - [A-Za-z0-9_]\W means non word characters – [^A-Za-z0-9_]\s means a whitespace character [ \t\n\r]\S means non white space characters
Repeating Character ClassesMatch two digits in a row/\d\d//[0-9][0-9]//\d{2}//[0-9]{2}/Match at least one digit (but as many as it can)/\d+/Match 0 to infinite digits/\d*/
Repeating Character Classes cont.* means match 0 or more+ means match 1 or more{x} where x is a number means match exactly x of the preceding selection{x,} means match at least x{x,y} means match between x and y{,y} means match up to y
More special characters? Means the preceding selection is optionalPutting it togetherTelephone Number/\(?(\d{3})\)?[\s-]?(\d{3})[\s-]?(\d{4})/Matches 720-675-7471 or (720)675-7471 or (720) 675-7471 or 7206757471 or 720 675 7471Find a misspelled word (and get great deals on EBay)/la[bp]topcomputer[s]?/
Regex AnchorsAnchors allow you to specify a position, like before, after or in between characters/^ab/ matches abcdefg but not cabNotice that it’s the caret character…  It means start of the string in this context, but means the opposite of a character class inside the square brackets/ab$/ matches cab but not abcdefg/^[a-z]+$/ will match a string that consists only of lowercase characters
Word Boundaries\b means word boundariesBefore first character if first character is a word characterAfter last character if last character is a word characterBetween two characters if one is a word character and the other is not/\bfish\b/ matches fish, but not fisherman or catfish./fish\b/ matches fish and catfish
Alternation/cow|boy/ - Matches cow, or boy or cowboy or coward, etc/\b(cow|boy)\b/ - Matches cow or boy but not cowboy or cowardThe above example also captures the matching word due to the parens.  More on this later.
Greedy vs LazyBy default, regular expressions are greedy…  That is, they will match as much as they canGrab a starting html tag:/<.+>/ Matches in bold:  <h1>Welcome to FRPUG</h1>Not what we wantMake it lazy:  /<.+?>/Now it matches <h1>Welcome to FRPUG</h1>
Another tag matching solution/<[^>]+>/Literally match a less than character followed by one or more non-greater than characters followed by a greater than characterThis way eliminates the need for the engine to backtrack (almost certainly faster than the last example).
Capturing part of regex (backreference)/__(construct|destruct)/Backreference will contain either construct or destruct so you can use it later/([a-z]+)\1/Matches groups of repeated characters that repeat an even number of times.Matches aa but not a.  Matches aaaaa/([a-z]{3})\1/Matches words like booboo or bambam
Backreference Continued…Very useful when performing regex search and replacepreg_replace('/\(?(\d{3})\)?[\s-]?(\d{3})[\s-]?(\d{4})/', '(\1) \2-\3', $phone)The above example will take any phone number from the previous example and return it formatted in (xxx) xxx-xxxx format
More backreferences…Replace duplicated words that that have been inadvertently left in
Non-capturing groupsMatch an IPv4 address/((?:\d{1,3}\.){3}\d{1,3})/We’re matching 1 to 3 digits followed by a dot 3 times.  We don’t care (right now) about the octets, we just want to repeat the match, so ?: says to not capture the group.
Pattern Modifiers	Modifiers go after the last delimiter (now you know why there are delimiters) and affect how the regex engine worksi – case insensitive matching (matches are case-sensitive by default)m – multiline matchings -  dot matches all characters, including \nx – ignore all whitespace characters except if escaped or in a character class
Pattern Modifiers Continued…D – Anchor for the end of the string only, otherwise $ matches \n charactersAllow username to be alphabetic only/^[A-Za-z]$/ - This will match dave\nextra stuffHowever, /^[A-Za-z]$/D will not matchU – Invert the meaning of the greediness.  With this on by default matches are lazy and ? makes it greedy.There are lots of other modifiers and you can see them at http://us2.php.net/manual/en/reference.pcre.pattern.modifiers.php
Named Capture GroupsRather than get back a numbered array of matches, get back an associative array.If you add a new capture group, you don’t have to renumber where you use the capture group
Named Capture Groups cont…Use (?P<named_group>pattern)
Named Capture Groups cont…Combined numbered and associative arrayCapture group 0 is the wholepattern that is matched.If our string to match against was abcde720-675 7471foobar, $matches[0] will contain720-675 7471
Positive Look Ahead MatchesLook for a pattern follow by another pattern/p(?=h)/ - Match  a “p” followed by an “h” but don’t include the “h”
Negative Look AheadLook for a pattern which is not followed by some other pattern/p(?=!h)/ - pnot followed by h.
Look AheadsPositive and negative look aheads do not capture anything. They just determine if the pattern match is possibleThey are zero-width/p[^h]/ is not the same as /p(?!h)//ph/ is not the same as /p(?=h)/
Look behindsPositive look behind/(?<=oo)d/ - d which is preceded by ooMatches “food”, “mood”, match only contains the “d”Negative look behind/(?<!oo)d/ - d which is not preceded by ooMatches “dude”, “crude”, and “d”
With great power…Test your regular expressions before they go to productionIt’s much                 easier to get them wrong than to get themright if you                don’t test
When to not use regexWhenever they aren’t needed.If you can use strstr or strpos or str_replace to do the job, do that. They are much faster, much simpler and easier to do correctly.However, if you cannot use those functions, regex may be your best bet.Don’t use regex when you really need a parser
Resourceshttp://regular-expressions.infohttp://us2.php.net/manual/en/ref.pcre.phpSpider Man from http://www.onlineseats.com/
Questions?dave@frontrangephp.org

More Related Content

PPT
PHP - Introduction to Object Oriented Programming with PHP
Vibrant Technologies & Computers
 
PDF
Php array
Nikul Shah
 
PPTX
Database Connectivity in PHP
Taha Malampatti
 
PPT
Php Presentation
Manish Bothra
 
PPT
Php mysql ppt
Karmatechnologies Pvt. Ltd.
 
PPTX
PHP FUNCTIONS
Zeeshan Ahmed
 
PPT
Php Using Arrays
mussawir20
 
PHP - Introduction to Object Oriented Programming with PHP
Vibrant Technologies & Computers
 
Php array
Nikul Shah
 
Database Connectivity in PHP
Taha Malampatti
 
Php Presentation
Manish Bothra
 
PHP FUNCTIONS
Zeeshan Ahmed
 
Php Using Arrays
mussawir20
 

What's hot (20)

PPT
Javascript
guest03a6e6
 
PPT
Arrays in PHP
Compare Infobase Limited
 
PPTX
Laravel ppt
Mayank Panchal
 
PPTX
Angularjs PPT
Amit Baghel
 
PPTX
Introduction to ASP.NET
Rajkumarsoy
 
PPTX
Node.js Express
Eyal Vardi
 
PPT
Javascript
Manav Prasad
 
PPTX
ASP.NET Presentation
dimuthu22
 
PDF
Chap 4 PHP.pdf
HASENSEID
 
PPTX
Data structure and algorithm All in One
jehan1987
 
PPT
Java Script ppt
Priya Goyal
 
PPT
Javascript
mussawir20
 
PPTX
Introduction to JavaScript Basics.
Hassan Ahmed Baig - Web Developer
 
PPT
Java collections concept
kumar gaurav
 
PPTX
PHP Presentation
JIGAR MAKHIJA
 
PPT
Class 3 - PHP Functions
Ahmed Swilam
 
PPTX
Introduction to Angularjs
Manish Shekhawat
 
Javascript
guest03a6e6
 
Laravel ppt
Mayank Panchal
 
Angularjs PPT
Amit Baghel
 
Introduction to ASP.NET
Rajkumarsoy
 
Node.js Express
Eyal Vardi
 
Javascript
Manav Prasad
 
ASP.NET Presentation
dimuthu22
 
Chap 4 PHP.pdf
HASENSEID
 
Data structure and algorithm All in One
jehan1987
 
Java Script ppt
Priya Goyal
 
Javascript
mussawir20
 
Introduction to JavaScript Basics.
Hassan Ahmed Baig - Web Developer
 
Java collections concept
kumar gaurav
 
PHP Presentation
JIGAR MAKHIJA
 
Class 3 - PHP Functions
Ahmed Swilam
 
Introduction to Angularjs
Manish Shekhawat
 
Ad

Similar to Regular expressions and php (20)

PPTX
Regular expressions
Eran Zimbler
 
PPTX
Regular Expressions 101 Introduction to Regular Expressions
Danny Bryant
 
PPT
PHP Regular Expressions
Jussi Pohjolainen
 
PDF
2013 - Andrei Zmievski: Clínica Regex
PHP Conference Argentina
 
PDF
Grokking regex
David Stockton
 
PPT
Scala Language Intro - Inspired by the Love Game
Antony Stubbs
 
PDF
/Regex makes me want to (weep|give up|(╯°□°)╯︵ ┻━┻)\.?/i
brettflorio
 
PDF
Coffee 'n code: Regexes
Phil Ewels
 
PDF
/Regex makes me want to (weep_give up_(╯°□°)╯︵ ┻━┻)/i (for 2024 CascadiaPHP)
brettflorio
 
PDF
Regular expressions
Raghu nath
 
PPT
Php String And Regular Expressions
mussawir20
 
PPSX
Regular expressions in oracle
Logan Palanisamy
 
ODP
Looking for Patterns
Keith Wright
 
PPT
Regular Expressions 2007
Geoffrey Dunn
 
ODP
PHP Web Programming
Muthuselvam RS
 
PPT
Regular Expressions
Satya Narayana
 
PPTX
Javascript正则表达式
ji guang
 
PDF
Maxbox starter20
Max Kleiner
 
PPT
Perl Presentation
Sopan Shewale
 
PPTX
Regular Expression
Mahzad Zahedi
 
Regular expressions
Eran Zimbler
 
Regular Expressions 101 Introduction to Regular Expressions
Danny Bryant
 
PHP Regular Expressions
Jussi Pohjolainen
 
2013 - Andrei Zmievski: Clínica Regex
PHP Conference Argentina
 
Grokking regex
David Stockton
 
Scala Language Intro - Inspired by the Love Game
Antony Stubbs
 
/Regex makes me want to (weep|give up|(╯°□°)╯︵ ┻━┻)\.?/i
brettflorio
 
Coffee 'n code: Regexes
Phil Ewels
 
/Regex makes me want to (weep_give up_(╯°□°)╯︵ ┻━┻)/i (for 2024 CascadiaPHP)
brettflorio
 
Regular expressions
Raghu nath
 
Php String And Regular Expressions
mussawir20
 
Regular expressions in oracle
Logan Palanisamy
 
Looking for Patterns
Keith Wright
 
Regular Expressions 2007
Geoffrey Dunn
 
PHP Web Programming
Muthuselvam RS
 
Regular Expressions
Satya Narayana
 
Javascript正则表达式
ji guang
 
Maxbox starter20
Max Kleiner
 
Perl Presentation
Sopan Shewale
 
Regular Expression
Mahzad Zahedi
 
Ad

More from David Stockton (18)

PDF
Phone calls and sms from php
David Stockton
 
PDF
The Art of Transduction
David Stockton
 
PDF
Using queues and offline processing to help speed up your application
David Stockton
 
PDF
Intermediate OOP in PHP
David Stockton
 
PDF
Building APIs with Apigilty and Zend Framework 2
David Stockton
 
PDF
API All the Things!
David Stockton
 
PDF
Intermediate OOP in PHP
David Stockton
 
PDF
Hacking sites for fun and profit
David Stockton
 
PDF
Beginning OOP in PHP
David Stockton
 
PDF
Common design patterns in php
David Stockton
 
PDF
Intermediate oop in php
David Stockton
 
PDF
Hacking sites for fun and profit
David Stockton
 
PDF
Hacking sites for fun and profit
David Stockton
 
PDF
Increasing code quality with code reviews (poetry version)
David Stockton
 
PPT
Tame Your Build And Deployment Process With Hudson, PHPUnit, and SSH
David Stockton
 
ZIP
Mercurial Distributed Version Control
David Stockton
 
PPTX
PHP 5 Magic Methods
David Stockton
 
PDF
FireBug And FirePHP
David Stockton
 
Phone calls and sms from php
David Stockton
 
The Art of Transduction
David Stockton
 
Using queues and offline processing to help speed up your application
David Stockton
 
Intermediate OOP in PHP
David Stockton
 
Building APIs with Apigilty and Zend Framework 2
David Stockton
 
API All the Things!
David Stockton
 
Intermediate OOP in PHP
David Stockton
 
Hacking sites for fun and profit
David Stockton
 
Beginning OOP in PHP
David Stockton
 
Common design patterns in php
David Stockton
 
Intermediate oop in php
David Stockton
 
Hacking sites for fun and profit
David Stockton
 
Hacking sites for fun and profit
David Stockton
 
Increasing code quality with code reviews (poetry version)
David Stockton
 
Tame Your Build And Deployment Process With Hudson, PHPUnit, and SSH
David Stockton
 
Mercurial Distributed Version Control
David Stockton
 
PHP 5 Magic Methods
David Stockton
 
FireBug And FirePHP
David Stockton
 

Recently uploaded (20)

PDF
AI Unleashed - Shaping the Future -Starting Today - AIOUG Yatra 2025 - For Co...
Sandesh Rao
 
PDF
Unlocking the Future- AI Agents Meet Oracle Database 23ai - AIOUG Yatra 2025.pdf
Sandesh Rao
 
PDF
Tea4chat - another LLM Project by Kerem Atam
a0m0rajab1
 
PDF
Economic Impact of Data Centres to the Malaysian Economy
flintglobalapac
 
PDF
Orbitly Pitch Deck|A Mission-Driven Platform for Side Project Collaboration (...
zz41354899
 
PPTX
IT Runs Better with ThousandEyes AI-driven Assurance
ThousandEyes
 
PPTX
cloud computing vai.pptx for the project
vaibhavdobariyal79
 
PPTX
OA presentation.pptx OA presentation.pptx
pateldhruv002338
 
PPTX
Introduction to Flutter by Ayush Desai.pptx
ayushdesai204
 
PDF
Oracle AI Vector Search- Getting Started and what's new in 2025- AIOUG Yatra ...
Sandesh Rao
 
PPTX
Applied-Statistics-Mastering-Data-Driven-Decisions.pptx
parmaryashparmaryash
 
PPTX
The Future of AI & Machine Learning.pptx
pritsen4700
 
PPTX
Dev Dives: Automate, test, and deploy in one place—with Unified Developer Exp...
AndreeaTom
 
PDF
MASTERDECK GRAPHSUMMIT SYDNEY (Public).pdf
Neo4j
 
PPTX
Agile Chennai 18-19 July 2025 | Emerging patterns in Agentic AI by Bharani Su...
AgileNetwork
 
PDF
The Future of Mobile Is Context-Aware—Are You Ready?
iProgrammer Solutions Private Limited
 
PPTX
Simple and concise overview about Quantum computing..pptx
mughal641
 
PDF
Automating ArcGIS Content Discovery with FME: A Real World Use Case
Safe Software
 
PDF
NewMind AI Weekly Chronicles - July'25 - Week IV
NewMind AI
 
PPTX
Agile Chennai 18-19 July 2025 Ideathon | AI Powered Microfinance Literacy Gui...
AgileNetwork
 
AI Unleashed - Shaping the Future -Starting Today - AIOUG Yatra 2025 - For Co...
Sandesh Rao
 
Unlocking the Future- AI Agents Meet Oracle Database 23ai - AIOUG Yatra 2025.pdf
Sandesh Rao
 
Tea4chat - another LLM Project by Kerem Atam
a0m0rajab1
 
Economic Impact of Data Centres to the Malaysian Economy
flintglobalapac
 
Orbitly Pitch Deck|A Mission-Driven Platform for Side Project Collaboration (...
zz41354899
 
IT Runs Better with ThousandEyes AI-driven Assurance
ThousandEyes
 
cloud computing vai.pptx for the project
vaibhavdobariyal79
 
OA presentation.pptx OA presentation.pptx
pateldhruv002338
 
Introduction to Flutter by Ayush Desai.pptx
ayushdesai204
 
Oracle AI Vector Search- Getting Started and what's new in 2025- AIOUG Yatra ...
Sandesh Rao
 
Applied-Statistics-Mastering-Data-Driven-Decisions.pptx
parmaryashparmaryash
 
The Future of AI & Machine Learning.pptx
pritsen4700
 
Dev Dives: Automate, test, and deploy in one place—with Unified Developer Exp...
AndreeaTom
 
MASTERDECK GRAPHSUMMIT SYDNEY (Public).pdf
Neo4j
 
Agile Chennai 18-19 July 2025 | Emerging patterns in Agentic AI by Bharani Su...
AgileNetwork
 
The Future of Mobile Is Context-Aware—Are You Ready?
iProgrammer Solutions Private Limited
 
Simple and concise overview about Quantum computing..pptx
mughal641
 
Automating ArcGIS Content Discovery with FME: A Real World Use Case
Safe Software
 
NewMind AI Weekly Chronicles - July'25 - Week IV
NewMind AI
 
Agile Chennai 18-19 July 2025 Ideathon | AI Powered Microfinance Literacy Gui...
AgileNetwork
 

Regular expressions and php

  • 1. Regular Expressions in PHP/(?:dave@davidstockton\.com)/Front Range PHP User GroupDavid Stockton
  • 2. What is a regular expression?A pattern used to describe a part of some text“Regular” has some implications to how it can be built, but that’s not really part of this presentationExtremely powerful and useful(And often abused)
  • 3. Regex JokeA programmer says, “I have a problem that I can solve with regular expressions.”Now, he has two problems…
  • 4. How to use regex in PHP The preg_* functionsPerl compatible regular expressions.Probably the most common regex syntaxThe ereg_* functionsPOSIX style regular expressionsI am not covering these functions.Don’t use the ereg ones. They are deprecated in PHP 5.3.
  • 5. How can we use regex in PHP?preg_match( ) – Searches a subject for a matchpreg_match_all( ) – Searches a subject for all matchespreg_replace( ) – Searches a subject for a pattern and replaces it with something elsepreg_split( ) – Split a string into an array based on a regex delimiterpreg_filter( ) – Identical to preg_replace except it returns only the matchespreg_replace_callback( ) – Like preg_replace, but replacement is defined in a callbackpreg_grep( ) – Returns an array of array elements that match a pattern
  • 6. How can we use regex in PHP?preg_quote( ) – Quotes regular expression characterspreg_last_error( ) – Returns the error code of the last PCRE (Perl Compatible Regular Expression) function execution
  • 7. How can we use regex in PHP?Those are the function calls, and we’ll play with the later.First, we need to learn how to create regex patterns since we need those for any function call.
  • 8. Starting Pattern/[A-Z0-9\._+=]+@[A-Z0-9\.-]\.[A-Z]{2,4}/iThis matches a series of letters, numbers, plus, dash, dots, underscores and equals, followed by an “AT” (@) sign, followed by a series of letters, numbers, dots and dashes, followed by a dot, followed by 2 to 4 letters.In other words… It matches an email address… Or rather some email addresses.
  • 9. Matching Email AddressesWhat about [email protected]?What about [email protected]?Both of those are valid email addresses, but they fail because our patter only allows 2-4 character TLD parts for the email address.How can we match all valid email addresses and only valid email addresses?
  • 10. The “real” email address regex(?:(?:\r\n)?[ \t])*(?:(?:(?:[^()<>@,;:\\".\[\] \000-\031]+(?:(?:(?:\r\n)?[ \t] )+|\Z|(?=[\["()<>@,;:\\".\[\]]))|"(?:[^\"\r\\]|\\.|(?:(?:\r\n)?[ \t]))*"(?:(?: \r\n)?[ \t])*)(?:\.(?:(?:\r\n)?[ \t])*(?:[^()<>@,;:\\".\[\] \000-\031]+(?:(?:( ?:\r\n)?[ \t])+|\Z|(?=[\["()<>@,;:\\".\[\]]))|"(?:[^\"\r\\]|\\.|(?:(?:\r\n)?[ \t]))*"(?:(?:\r\n)?[ \t])*))*@(?:(?:\r\n)?[ \t])*(?:[^()<>@,;:\\".\[\] \000-\0 31]+(?:(?:(?:\r\n)?[ \t])+|\Z|(?=[\["()<>@,;:\\".\[\]]))|\[([^\[\]\r\\]|\\.)*\ ](?:(?:\r\n)?[ \t])*)(?:\.(?:(?:\r\n)?[ \t])*(?:[^()<>@,;:\\".\[\] \000-\031]+ (?:(?:(?:\r\n)?[ \t])+|\Z|(?=[\["()<>@,;:\\".\[\]]))|\[([^\[\]\r\\]|\\.)*\](?: (?:\r\n)?[ \t])*))*|(?:[^()<>@,;:\\".\[\] \000-\031]+(?:(?:(?:\r\n)?[ \t])+|\Z |(?=[\["()<>@,;:\\".\[\]]))|"(?:[^\"\r\\]|\\.|(?:(?:\r\n)?[ \t]))*"(?:(?:\r\n) ?[ \t])*)*\<(?:(?:\r\n)?[ \t])*(?:@(?:[^()<>@,;:\\".\[\] \000-\031]+(?:(?:(?:\ r\n)?[ \t])+|\Z|(?=[\["()<>@,;:\\".\[\]]))|\[([^\[\]\r\\]|\\.)*\](?:(?:\r\n)?[ \t])*)(?:\.(?:(?:\r\n)?[ \t])*(?:[^()<>@,;:\\".\[\] \000-\031]+(?:(?:(?:\r\n) ?[ \t])+|\Z|(?=[\["()<>@,;:\\".\[\]]))|\[([^\[\]\r\\]|\\.)*\](?:(?:\r\n)?[ \t] )*))*(?:,@(?:(?:\r\n)?[ \t])*(?:[^()<>@,;:\\".\[\] \000-\031]+(?:(?:(?:\r\n)?[ \t])+|\Z|(?=[\["()<>@,;:\\".\[\]]))|\[([^\[\]\r\\]|\\.)*\](?:(?:\r\n)?[ \t])* )(?:\.(?:(?:\r\n)?[ \t])*(?:[^()<>@,;:\\".\[\] \000-\031]+(?:(?:(?:\r\n)?[ \t] )+|\Z|(?=[\["()<>@,;:\\".\[\]]))|\[([^\[\]\r\\]|\\.)*\](?:(?:\r\n)?[ \t])*))*) *:(?:(?:\r\n)?[ \t])*)?(?:[^()<>@,;:\\".\[\] \000-\031]+(?:(?:(?:\r\n)?[ \t])+ |\Z|(?=[\["()<>@,;:\\".\[\]]))|"(?:[^\"\r\\]|\\.|(?:(?:\r\n)?[ \t]))*"(?:(?:\r \n)?[ \t])*)(?:\.(?:(?:\r\n)?[ \t])*(?:[^()<>@,;:\\".\[\] \000-\031]+(?:(?:(?: \r\n)?[ \t])+|\Z|(?=[\["()<>@,;:\\".\[\]]))|"(?:[^\"\r\\]|\\.|(?:(?:\r\n)?[ \t ]))*"(?:(?:\r\n)?[ \t])*))*@(?:(?:\r\n)?[ \t])*(?:[^()<>@,;:\\".\[\] \000-\031 ]+(?:(?:(?:\r\n)?[ \t])+|\Z|(?=[\["()<>@,;:\\".\[\]]))|\[([^\[\]\r\\]|\\.)*\]( ?:(?:\r\n)?[ \t])*)(?:\.(?:(?:\r\n)?[ \t])*(?:[^()<>@,;:\\".\[\] \000-\031]+(? :(?:(?:\r\n)?[ \t])+|\Z|(?=[\["()<>@,;:\\".\[\]]))|\[([^\[\]\r\\]|\\.)*\](?:(? :\r\n)?[ \t])*))*\>(?:(?:\r\n)?[ \t])*)|(?:[^()<>@,;:\\".\[\] \000-\031]+(?:(? :(?:\r\n)?[ \t])+|\Z|(?=[\["()<>@,;:\\".\[\]]))|"(?:[^\"\r\\]|\\.|(?:(?:\r\n)? [ \t]))*"(?:(?:\r\n)?[ \t])*)*:(?:(?:\r\n)?[ \t])*(?:(?:(?:[^()<>@,;:\\".\[\] \000-\031]+(?:(?:(?:\r\n)?[ \t])+|\Z|(?=[\["()<>@,;:\\".\[\]]))|"(?:[^\"\r\\]| \\.|(?:(?:\r\n)?[ \t]))*"(?:(?:\r\n)?[ \t])*)(?:\.(?:(?:\r\n)?[ \t])*(?:[^()<> @,;:\\".\[\] \000-\031]+(?:(?:(?:\r\n)?[ \t])+|\Z|(?=[\["()<>@,;:\\".\[\]]))|" (?:[^\"\r\\]|\\.|(?:(?:\r\n)?[ \t]))*"(?:(?:\r\n)?[ \t])*))*@(?:(?:\r\n)?[ \t] )*(?:[^()<>@,;:\\".\[\] \000-\031]+(?:(?:(?:\r\n)?[ \t])+|\Z|(?=[\["()<>@,;:\\ ".\[\]]))|\[([^\[\]\r\\]|\\.)*\](?:(?:\r\n)?[ \t])*)(?:\.(?:(?:\r\n)?[ \t])*(? :[^()<>@,;:\\".\[\] \000-
  • 11. The “real” email address regex cont.\031]+(?:(?:(?:\r\n)?[ \t])+|\Z|(?=[\["()<>@,;:\\".\[ \]]))|\[([^\[\]\r\\]|\\.)*\](?:(?:\r\n)?[ \t])*))*|(?:[^()<>@,;:\\".\[\] \000- \031]+(?:(?:(?:\r\n)?[ \t])+|\Z|(?=[\["()<>@,;:\\".\[\]]))|"(?:[^\"\r\\]|\\.|( ?:(?:\r\n)?[ \t]))*"(?:(?:\r\n)?[ \t])*)*\<(?:(?:\r\n)?[ \t])*(?:@(?:[^()<>@,; :\\".\[\] \000-\031]+(?:(?:(?:\r\n)?[ \t])+|\Z|(?=[\["()<>@,;:\\".\[\]]))|\[([ ^\[\]\r\\]|\\.)*\](?:(?:\r\n)?[ \t])*)(?:\.(?:(?:\r\n)?[ \t])*(?:[^()<>@,;:\\" .\[\] \000-\031]+(?:(?:(?:\r\n)?[ \t])+|\Z|(?=[\["()<>@,;:\\".\[\]]))|\[([^\[\ ]\r\\]|\\.)*\](?:(?:\r\n)?[ \t])*))*(?:,@(?:(?:\r\n)?[ \t])*(?:[^()<>@,;:\\".\ [\] \000-\031]+(?:(?:(?:\r\n)?[ \t])+|\Z|(?=[\["()<>@,;:\\".\[\]]))|\[([^\[\]\ r\\]|\\.)*\](?:(?:\r\n)?[ \t])*)(?:\.(?:(?:\r\n)?[ \t])*(?:[^()<>@,;:\\".\[\] \000-\031]+(?:(?:(?:\r\n)?[ \t])+|\Z|(?=[\["()<>@,;:\\".\[\]]))|\[([^\[\]\r\\] |\\.)*\](?:(?:\r\n)?[ \t])*))*)*:(?:(?:\r\n)?[ \t])*)?(?:[^()<>@,;:\\".\[\] \0 00-\031]+(?:(?:(?:\r\n)?[ \t])+|\Z|(?=[\["()<>@,;:\\".\[\]]))|"(?:[^\"\r\\]|\\ .|(?:(?:\r\n)?[ \t]))*"(?:(?:\r\n)?[ \t])*)(?:\.(?:(?:\r\n)?[ \t])*(?:[^()<>@, ;:\\".\[\] \000-\031]+(?:(?:(?:\r\n)?[ \t])+|\Z|(?=[\["()<>@,;:\\".\[\]]))|"(? :[^\"\r\\]|\\.|(?:(?:\r\n)?[ \t]))*"(?:(?:\r\n)?[ \t])*))*@(?:(?:\r\n)?[ \t])* (?:[^()<>@,;:\\".\[\] \000-\031]+(?:(?:(?:\r\n)?[ \t])+|\Z|(?=[\["()<>@,;:\\". \[\]]))|\[([^\[\]\r\\]|\\.)*\](?:(?:\r\n)?[ \t])*)(?:\.(?:(?:\r\n)?[ \t])*(?:[ ^()<>@,;:\\".\[\] \000-\031]+(?:(?:(?:\r\n)?[ \t])+|\Z|(?=[\["()<>@,;:\\".\[\] ]))|\[([^\[\]\r\\]|\\.)*\](?:(?:\r\n)?[ \t])*))*\>(?:(?:\r\n)?[ \t])*)(?:,\s*( ?:(?:[^()<>@,;:\\".\[\] \000-\031]+(?:(?:(?:\r\n)?[ \t])+|\Z|(?=[\["()<>@,;:\\ ".\[\]]))|"(?:[^\"\r\\]|\\.|(?:(?:\r\n)?[ \t]))*"(?:(?:\r\n)?[ \t])*)(?:\.(?:( ?:\r\n)?[ \t])*(?:[^()<>@,;:\\".\[\] \000-\031]+(?:(?:(?:\r\n)?[ \t])+|\Z|(?=[ \["()<>@,;:\\".\[\]]))|"(?:[^\"\r\\]|\\.|(?:(?:\r\n)?[ \t]))*"(?:(?:\r\n)?[ \t ])*))*@(?:(?:\r\n)?[ \t])*(?:[^()<>@,;:\\".\[\] \000-\031]+(?:(?:(?:\r\n)?[ \t ])+|\Z|(?=[\["()<>@,;:\\".\[\]]))|\[([^\[\]\r\\]|\\.)*\](?:(?:\r\n)?[ \t])*)(? :\.(?:(?:\r\n)?[ \t])*(?:[^()<>@,;:\\".\[\] \000-\031]+(?:(?:(?:\r\n)?[ \t])+| \Z|(?=[\["()<>@,;:\\".\[\]]))|\[([^\[\]\r\\]|\\.)*\](?:(?:\r\n)?[ \t])*))*|(?: [^()<>@,;:\\".\[\] \000-\031]+(?:(?:(?:\r\n)?[ \t])+|\Z|(?=[\["()<>@,;:\\".\[\ ]]))|"(?:[^\"\r\\]|\\.|(?:(?:\r\n)?[ \t]))*"(?:(?:\r\n)?[ \t])*)*\<(?:(?:\r\n) ?[ \t])*(?:@(?:[^()<>@,;:\\".\[\] \000-\031]+(?:(?:(?:\r\n)?[ \t])+|\Z|(?=[\[" ()<>@,;:\\".\[\]]))|\[([^\[\]\r\\]|\\.)*\](?:(?:\r\n)?[ \t])*)(?:\.(?:(?:\r\n) ?[ \t])*(?:[^()<>@,;:\\".\[\] \000-\031]+(?:(?:(?:\r\n)?[ \t])+|\Z|(?=[\["()<> @,;:\\".\[\]]))|\[([^\[\]\r\\]|\\.)*\](?:(?:\r\n)?[ \t])*))*(?:,@(?:(?:\r\n)?[ \t])*(?:[^()<>@,;:\\".\[\] \000-\031]+(?:(?:(?:\r\n)?[ \t])+|\Z|(?=[\["()<>@, ;:\\".\[\]]))|\[([^\[\]\r\\]|\\.)*\](?:(?:\r\n)?[ \t])*)(?:\.(?:(?:\r\n)?[ \t] )*(?:[^()<>@,;:\\".\[\] \000-\031]+(?:(?:(?:\r\n)?[ \t])+|\Z|(?=[\["()<>@,;:\\ ".\[\]]))|\[([^\[\]\r\\]|\\.)*\](?:(?:\r\n)?[ \t])*))*)*:(?:(?:\r\n)?[ \t])*)? (?:[^()<>@,;:\\".\[\] \000-\031]+(?:(?:(?:\r\n)?[ \t])+|\Z|(?=[\["()<>@,;:\\". \[\]]))|"(?:[^\"\r\\]|\\.|(?:(?:\r\n)?[ \t]))*"(?:(?:\r\n)?[ \t])*)(?:\.(?:(?: \r\n)?[ \t])*(?:[^()<>@,;:\\".\[\] \000-\031]+(?:(?:(?:\r\n)?[ \t])+|\Z|(?=[\[ "()<>@,;:\\".\[\]]))|"(?:[^\"\r\\]|\\.|(?:(?:\r\n)?[ \t]))*"(?:(?:\r\n)?[ \t]) *))*@(?:(?:\r\n)?[ \t])*(?:[^()<>@,;:\\".\[\] \000-\031]+(?:(?:(?:\r\n)?[ \t]) +|\Z|(?=[\["()<>@,;:\\".\[\]]))|\[([^\[\]\r\\]|\\.)*\](?:(?:\r\n)?[ \t])*)(?:\ .(?:(?:\r\n)?[ \t])*(?:[^()<>@,;:\\".\[\] \000-\031]+(?:(?:(?:\r\n)?[ \t])+|\Z |(?=[\["()<>@,;:\\".\[\]]))|\[([^\[\]\r\\]|\\.)*\](?:(?:\r\n)?[ \t])*))*\>(?:( ?:\r\n)?[ \t])*))*)?;\s*)
  • 12. So… How do we write this?Don’t. Other much more simple patterns have been written and will match 99.9% of valid email addresses.Use something like Zend_Validate_EmailAddress
  • 13. So now the real learnin’…Letters and numbers match… letters and numbers/a/ - Matches a string that contains an “a”/7/ - Matches a string that contains a 7.
  • 14. More learnin’Match a word/regex/ - Matches a string with the word “regex” in itYou can use a pipe character to give a choice/pizza|steak|cheeseburger/ - Matches a string with any of these foods
  • 15. DelimitersThe examples so far have started with / and ended with /.These are delimiters and let the regex engine know where the pattern starts and ends.You can choose another delimiter if you’d like or if it’s more convenientMatch namespace:#/My/PHP/Namespace#If I used “/” in that example, I’d need to escape each of the forward slashes to differentiate them from the delimiter
  • 16. Character Matching ContinuedYou can match a selection of characters/[Pp][Hh][Pp]/ - Matches PHP in any mixture of upper and lowercaseRanges can be defined/[abcdefghijklmnopqrstuvwxyz]/ - Matches any lowercase alpha character/[a-z]/ - Matches any lowercase alpha character
  • 17. Character Selection RangesRanges can be combined/[A-Za-z0-9]/ - Matches an alphanumeric character/[A-Fa-f0-9]/ - Matches any hex characterCharacter Selection can be inversed/[^0-9]/ - Matches any non-digit character/[^ ]/ - Matches any non space character/[.!@#$%^*]/ - Matches some punctuation
  • 18. Special CharactersDot (.) matches any character/.//../ - Matches any two charactersTo match an actual dot character, you must escape/\./ - Matches a single dot characterUnless it’s a character selection/[.]/ - Matches a single dot character
  • 19. Character classes\d means [0-9]\D means non-digits - [^0-9]\w means word characters - [A-Za-z0-9_]\W means non word characters – [^A-Za-z0-9_]\s means a whitespace character [ \t\n\r]\S means non white space characters
  • 20. Repeating Character ClassesMatch two digits in a row/\d\d//[0-9][0-9]//\d{2}//[0-9]{2}/Match at least one digit (but as many as it can)/\d+/Match 0 to infinite digits/\d*/
  • 21. Repeating Character Classes cont.* means match 0 or more+ means match 1 or more{x} where x is a number means match exactly x of the preceding selection{x,} means match at least x{x,y} means match between x and y{,y} means match up to y
  • 22. More special characters? Means the preceding selection is optionalPutting it togetherTelephone Number/\(?(\d{3})\)?[\s-]?(\d{3})[\s-]?(\d{4})/Matches 720-675-7471 or (720)675-7471 or (720) 675-7471 or 7206757471 or 720 675 7471Find a misspelled word (and get great deals on EBay)/la[bp]topcomputer[s]?/
  • 23. Regex AnchorsAnchors allow you to specify a position, like before, after or in between characters/^ab/ matches abcdefg but not cabNotice that it’s the caret character… It means start of the string in this context, but means the opposite of a character class inside the square brackets/ab$/ matches cab but not abcdefg/^[a-z]+$/ will match a string that consists only of lowercase characters
  • 24. Word Boundaries\b means word boundariesBefore first character if first character is a word characterAfter last character if last character is a word characterBetween two characters if one is a word character and the other is not/\bfish\b/ matches fish, but not fisherman or catfish./fish\b/ matches fish and catfish
  • 25. Alternation/cow|boy/ - Matches cow, or boy or cowboy or coward, etc/\b(cow|boy)\b/ - Matches cow or boy but not cowboy or cowardThe above example also captures the matching word due to the parens. More on this later.
  • 26. Greedy vs LazyBy default, regular expressions are greedy… That is, they will match as much as they canGrab a starting html tag:/<.+>/ Matches in bold: <h1>Welcome to FRPUG</h1>Not what we wantMake it lazy: /<.+?>/Now it matches <h1>Welcome to FRPUG</h1>
  • 27. Another tag matching solution/<[^>]+>/Literally match a less than character followed by one or more non-greater than characters followed by a greater than characterThis way eliminates the need for the engine to backtrack (almost certainly faster than the last example).
  • 28. Capturing part of regex (backreference)/__(construct|destruct)/Backreference will contain either construct or destruct so you can use it later/([a-z]+)\1/Matches groups of repeated characters that repeat an even number of times.Matches aa but not a. Matches aaaaa/([a-z]{3})\1/Matches words like booboo or bambam
  • 29. Backreference Continued…Very useful when performing regex search and replacepreg_replace('/\(?(\d{3})\)?[\s-]?(\d{3})[\s-]?(\d{4})/', '(\1) \2-\3', $phone)The above example will take any phone number from the previous example and return it formatted in (xxx) xxx-xxxx format
  • 30. More backreferences…Replace duplicated words that that have been inadvertently left in
  • 31. Non-capturing groupsMatch an IPv4 address/((?:\d{1,3}\.){3}\d{1,3})/We’re matching 1 to 3 digits followed by a dot 3 times. We don’t care (right now) about the octets, we just want to repeat the match, so ?: says to not capture the group.
  • 32. Pattern Modifiers Modifiers go after the last delimiter (now you know why there are delimiters) and affect how the regex engine worksi – case insensitive matching (matches are case-sensitive by default)m – multiline matchings - dot matches all characters, including \nx – ignore all whitespace characters except if escaped or in a character class
  • 33. Pattern Modifiers Continued…D – Anchor for the end of the string only, otherwise $ matches \n charactersAllow username to be alphabetic only/^[A-Za-z]$/ - This will match dave\nextra stuffHowever, /^[A-Za-z]$/D will not matchU – Invert the meaning of the greediness. With this on by default matches are lazy and ? makes it greedy.There are lots of other modifiers and you can see them at http://us2.php.net/manual/en/reference.pcre.pattern.modifiers.php
  • 34. Named Capture GroupsRather than get back a numbered array of matches, get back an associative array.If you add a new capture group, you don’t have to renumber where you use the capture group
  • 35. Named Capture Groups cont…Use (?P<named_group>pattern)
  • 36. Named Capture Groups cont…Combined numbered and associative arrayCapture group 0 is the wholepattern that is matched.If our string to match against was abcde720-675 7471foobar, $matches[0] will contain720-675 7471
  • 37. Positive Look Ahead MatchesLook for a pattern follow by another pattern/p(?=h)/ - Match a “p” followed by an “h” but don’t include the “h”
  • 38. Negative Look AheadLook for a pattern which is not followed by some other pattern/p(?=!h)/ - pnot followed by h.
  • 39. Look AheadsPositive and negative look aheads do not capture anything. They just determine if the pattern match is possibleThey are zero-width/p[^h]/ is not the same as /p(?!h)//ph/ is not the same as /p(?=h)/
  • 40. Look behindsPositive look behind/(?<=oo)d/ - d which is preceded by ooMatches “food”, “mood”, match only contains the “d”Negative look behind/(?<!oo)d/ - d which is not preceded by ooMatches “dude”, “crude”, and “d”
  • 41. With great power…Test your regular expressions before they go to productionIt’s much easier to get them wrong than to get themright if you don’t test
  • 42. When to not use regexWhenever they aren’t needed.If you can use strstr or strpos or str_replace to do the job, do that. They are much faster, much simpler and easier to do correctly.However, if you cannot use those functions, regex may be your best bet.Don’t use regex when you really need a parser