SlideShare a Scribd company logo
PYTHON REGULAR EXPRESSIONS
John Zhang
Tuesday, December 11, 2012
Regular Expressions
• Regular expressions are a powerful string
manipulation tool
• All modern languages have similar library
packages for regular expressions
• Use regular expressions to:
– Search a string (search and match)
– Replace parts of a string (sub)
– Break stings into smaller pieces (split)
Regular Expression Python Syntax
• regular match:
Example: the regular expression “test” only
matches the string ‘test’
• [x] matches any one of a list of characters
Example: “*abc+” matches ‘a’,‘b’,or ‘c’
• [^x] matches any one character that is not
included in x
“*^abc+” matches any single character except
‘a’,’b’,or ‘c’
Regular Expressions Syntax
• “.” matches any single character
• Parentheses can be used for grouping by ()
Example: “(abc)+” matches ’abc’, ‘abcabc’,
‘abcabcabc’, etc.
• x|y matches x or y
Example: “this|that” matches ‘this’ and ‘that’,
but not ‘thisthat’.
Regular Expression Syntax
• x* matches zero or more x’s
“a*” matches ’’, ’a’, ’aa’, etc.
• x+ matches one or more x’s
“a+” matches ’a’,’aa’,’aaa’, etc.
• x? matches zero or one x’s
“a?” matches ’’ or ’a’ .
• x{m, n} matches i x‘s, where m<i< n
“a,2,3-” matches ’aa’ or ’aaa’
Regular Expression Syntax
• “d” matches any digit; “D” matches any non-digit
• “s” matches any whitespace character; “S”
matches any non-whitespace character
• “w” matches any alphanumeric character; “W”
matches any non-alphanumeric character
• “^” matches the beginning of the string; “$”
matches the end of the string
• “b” matches a word boundary; “B” matches
position that is not a word boundary
Search and Match
• The two basic functions are re.search and re.match
– Search looks for a pattern anywhere in a string
– Match looks for a match staring at the beginning
• Both return None if the pattern is not found (logical false)
and a “match object” if it is
pat = "a*b"
import re
matchObj = re.search(pat,"fooaaabcde")
if matchObj:
print “match successfully at %s” % matchObj.group(0)
Q: What’s a match object?
• A: an instance of the match class with the details of the match
result
pat = "a*b"
>>> r1 = re.search(pat,"fooaaabcde")
>>> r1.group() # group returns string matched
'aaab'
>>> r1.start() # index of the match start
3
>>> r1.end() # index of the match end
7
>>> r1.span() # tuple of (start, end)
(3, 7)
What got matched?
• Here’s a pattern to match simple email addresses
w+@(w+.)+(com|org|net|edu)
>>> pat1 = "w+@(w+.)+(com|org|net|edu)"
>>> r1 = re.match(pat1,“qzhang@pku.cn.edu")
>>> r1.group()
'qzhang@pku.cn.edu’

• We might want to extract the pattern parts, like the
email name and host
What got matched?
• We can put parentheses around groups we want to be
able to reference
>>> pat2 = "(w+)@((w+.)+(com|org|net|edu))"
>>> r2 = re.match(pat2,"qzhang@pku.cn.edu")
>>> r2.group(1)
‘qzhang'
>>> r2.group(2)
‘pku.cn.edu'
>>> r2.groups()
r2.groups()
(‘qzhang', ' pku.cn.edu ', ‘cn.', 'edu’)

• Note that the ‘groups’ are numbered in a preorder
traversal of the forest
What got matched?
• We can ‘label’ the groups as well…
>>> pat3 ="(?P<name>w+)@(?P<host>(w+.)+(com|org|net|edu))"
>>> r3 = re.match(pat3,"qzhang@pku.cn.edu")
>>> r3.group('name')
‘qzhang'
>>> r3.group('host')
‘pku.cn.edu’

• And reference the matching parts by the labels
More re functions
• re.split() is like split but can use patterns
>>> re.split("W+", “This... is a test, short and sweet, of split().”)
*'This', 'is', 'a', 'test', 'short’, 'and', 'sweet', 'of', 'split’, ‘’+

• re.sub substitutes one string for a pattern
>>> re.sub('(blue|white|red)', 'black', 'blue socks and red shoes')
'black socks and black shoes’

• re.findall() finds al matches
>>> re.findall("d+”,"12 dogs,11 cats, 1 egg")
*'12', '11', ’1’+
Compiling regular expressions
• If you plan to use a re pattern more than once,
compile it to a re object
• Python produces a special data structure that
speeds up matching
>>> capt3 = re.compile(pat3)
>>> cpat3
<_sre.SRE_Pattern object at 0x2d9c0>
>>> r3 = cpat3.search("qzhang@pku.cn.edu")
>>> r3
<_sre.SRE_Match object at 0x895a0>
>>> r3.group()
'qzhang@pku.cn.edu'
Pattern object methods
• There are methods defined for a pattern object that
parallel the regular expression functions, e.g.,
– match
– search
– split
– findall
– sub

More Related Content

What's hot (20)

PPTX
Java: Regular Expression
Masudul Haque
 
PPT
Regular Expressions
Satya Narayana
 
PPT
16 Java Regex
wayn
 
PPTX
Regular expressions
Thomas Langston
 
PPTX
Python- Regular expression
Megha V
 
PPTX
Regular Expression
Mahzad Zahedi
 
PPT
Regular Expression
Bharat17485
 
PPT
Php String And Regular Expressions
mussawir20
 
PDF
Strings in Python
nitamhaske
 
PPTX
Regular expression
Larry Nung
 
ODP
Regular Expression
Lambert Lum
 
ODP
Regex Presentation
arnolambert
 
PPTX
Regular Expressions in Java
OblivionWalker
 
PPTX
Finaal application on regular expression
Gagan019
 
PPT
Textpad and Regular Expressions
OCSI
 
PPTX
Regular Expressions 101 Introduction to Regular Expressions
Danny Bryant
 
PDF
Strings in python
Prabhakaran V M
 
PDF
Python strings
Mohammed Sikander
 
PPTX
Bioinformatics p2-p3-perl-regexes v2014
Prof. Wim Van Criekinge
 
PPTX
Regular expressions
Brij Kishore
 
Java: Regular Expression
Masudul Haque
 
Regular Expressions
Satya Narayana
 
16 Java Regex
wayn
 
Regular expressions
Thomas Langston
 
Python- Regular expression
Megha V
 
Regular Expression
Mahzad Zahedi
 
Regular Expression
Bharat17485
 
Php String And Regular Expressions
mussawir20
 
Strings in Python
nitamhaske
 
Regular expression
Larry Nung
 
Regular Expression
Lambert Lum
 
Regex Presentation
arnolambert
 
Regular Expressions in Java
OblivionWalker
 
Finaal application on regular expression
Gagan019
 
Textpad and Regular Expressions
OCSI
 
Regular Expressions 101 Introduction to Regular Expressions
Danny Bryant
 
Strings in python
Prabhakaran V M
 
Python strings
Mohammed Sikander
 
Bioinformatics p2-p3-perl-regexes v2014
Prof. Wim Van Criekinge
 
Regular expressions
Brij Kishore
 

Similar to Python advanced 2. regular expression in python (20)

PPTX
unit-4 regular expression.pptx
PadreBhoj
 
PDF
regular-expression.pdf
DarellMuchoko
 
PDF
Python Regular Expressions
BMS Institute of Technology and Management
 
PPTX
Pythonlearn-11-Regex.pptx
Dave Tan
 
PDF
Module 3 - Regular Expressions, Dictionaries.pdf
GaneshRaghu4
 
PDF
Python regular expressions
Krishna Nanda
 
PDF
Regular expression in python for students
Manoj PAtil
 
PPTX
Python lec5
Swarup Ghosh
 
PDF
Regular expressions
Raghu nath
 
PPTX
P3 2018 python_regexes
Prof. Wim Van Criekinge
 
PPTX
Regular Expressions
Akhil Kaushik
 
PPTX
Regular expressions,function and glob module.pptx
Ramakrishna Reddy Bijjam
 
PPTX
2016 bioinformatics i_python_part_3_io_and_strings_wim_vancriekinge
Prof. Wim Van Criekinge
 
PPTX
P3 2017 python_regexes
Prof. Wim Van Criekinge
 
PPTX
Regular_Expressions.pptx
DurgaNayak4
 
PPTX
UNIT-4( pythonRegular Expressions) (3).pptx
YHarika2
 
PPTX
regex.pptx
qnuslv
 
PPTX
Regular Expressions in Python.pptx
Ramakrishna Reddy Bijjam
 
PDF
A3 sec -_regular_expressions
a3sec
 
ODP
OISF: Regular Expressions (Regex) Overview
ThreatReel Podcast
 
unit-4 regular expression.pptx
PadreBhoj
 
regular-expression.pdf
DarellMuchoko
 
Python Regular Expressions
BMS Institute of Technology and Management
 
Pythonlearn-11-Regex.pptx
Dave Tan
 
Module 3 - Regular Expressions, Dictionaries.pdf
GaneshRaghu4
 
Python regular expressions
Krishna Nanda
 
Regular expression in python for students
Manoj PAtil
 
Python lec5
Swarup Ghosh
 
Regular expressions
Raghu nath
 
P3 2018 python_regexes
Prof. Wim Van Criekinge
 
Regular Expressions
Akhil Kaushik
 
Regular expressions,function and glob module.pptx
Ramakrishna Reddy Bijjam
 
2016 bioinformatics i_python_part_3_io_and_strings_wim_vancriekinge
Prof. Wim Van Criekinge
 
P3 2017 python_regexes
Prof. Wim Van Criekinge
 
Regular_Expressions.pptx
DurgaNayak4
 
UNIT-4( pythonRegular Expressions) (3).pptx
YHarika2
 
regex.pptx
qnuslv
 
Regular Expressions in Python.pptx
Ramakrishna Reddy Bijjam
 
A3 sec -_regular_expressions
a3sec
 
OISF: Regular Expressions (Regex) Overview
ThreatReel Podcast
 
Ad

More from John(Qiang) Zhang (11)

PPTX
Git and github introduction
John(Qiang) Zhang
 
PPT
Python testing
John(Qiang) Zhang
 
PPT
Profiling in python
John(Qiang) Zhang
 
PPT
Introduction to jython
John(Qiang) Zhang
 
PPT
Introduction to cython
John(Qiang) Zhang
 
PPT
A useful tools in windows py2exe(optional)
John(Qiang) Zhang
 
PPT
Python advanced 3.the python std lib by example –data structures
John(Qiang) Zhang
 
PPT
Python advanced 3.the python std lib by example – system related modules
John(Qiang) Zhang
 
PPT
Python advanced 3.the python std lib by example – application building blocks
John(Qiang) Zhang
 
PPT
Python advanced 1.handle error, generator, decorator and decriptor
John(Qiang) Zhang
 
PPT
Python advanced 3.the python std lib by example – algorithm
John(Qiang) Zhang
 
Git and github introduction
John(Qiang) Zhang
 
Python testing
John(Qiang) Zhang
 
Profiling in python
John(Qiang) Zhang
 
Introduction to jython
John(Qiang) Zhang
 
Introduction to cython
John(Qiang) Zhang
 
A useful tools in windows py2exe(optional)
John(Qiang) Zhang
 
Python advanced 3.the python std lib by example –data structures
John(Qiang) Zhang
 
Python advanced 3.the python std lib by example – system related modules
John(Qiang) Zhang
 
Python advanced 3.the python std lib by example – application building blocks
John(Qiang) Zhang
 
Python advanced 1.handle error, generator, decorator and decriptor
John(Qiang) Zhang
 
Python advanced 3.the python std lib by example – algorithm
John(Qiang) Zhang
 
Ad

Recently uploaded (20)

PDF
CIFDAQ Market Wrap for the week of 4th July 2025
CIFDAQ
 
PDF
Book industry state of the nation 2025 - Tech Forum 2025
BookNet Canada
 
PPTX
"Autonomy of LLM Agents: Current State and Future Prospects", Oles` Petriv
Fwdays
 
PDF
“Voice Interfaces on a Budget: Building Real-time Speech Recognition on Low-c...
Edge AI and Vision Alliance
 
DOCX
Cryptography Quiz: test your knowledge of this important security concept.
Rajni Bhardwaj Grover
 
PPTX
AI Penetration Testing Essentials: A Cybersecurity Guide for 2025
defencerabbit Team
 
PDF
Staying Human in a Machine- Accelerated World
Catalin Jora
 
PPTX
Future Tech Innovations 2025 – A TechLists Insight
TechLists
 
PPTX
COMPARISON OF RASTER ANALYSIS TOOLS OF QGIS AND ARCGIS
Sharanya Sarkar
 
PDF
Transcript: New from BookNet Canada for 2025: BNC BiblioShare - Tech Forum 2025
BookNet Canada
 
PDF
Bitcoin for Millennials podcast with Bram, Power Laws of Bitcoin
Stephen Perrenod
 
PDF
LOOPS in C Programming Language - Technology
RishabhDwivedi43
 
PPTX
Seamless Tech Experiences Showcasing Cross-Platform App Design.pptx
presentifyai
 
PDF
“NPU IP Hardware Shaped Through Software and Use-case Analysis,” a Presentati...
Edge AI and Vision Alliance
 
PPTX
Mastering ODC + Okta Configuration - Chennai OSUG
HathiMaryA
 
PDF
New from BookNet Canada for 2025: BNC BiblioShare - Tech Forum 2025
BookNet Canada
 
PDF
The Rise of AI and IoT in Mobile App Tech.pdf
IMG Global Infotech
 
PDF
The 2025 InfraRed Report - Redpoint Ventures
Razin Mustafiz
 
PDF
Newgen Beyond Frankenstein_Build vs Buy_Digital_version.pdf
darshakparmar
 
PDF
Transforming Utility Networks: Large-scale Data Migrations with FME
Safe Software
 
CIFDAQ Market Wrap for the week of 4th July 2025
CIFDAQ
 
Book industry state of the nation 2025 - Tech Forum 2025
BookNet Canada
 
"Autonomy of LLM Agents: Current State and Future Prospects", Oles` Petriv
Fwdays
 
“Voice Interfaces on a Budget: Building Real-time Speech Recognition on Low-c...
Edge AI and Vision Alliance
 
Cryptography Quiz: test your knowledge of this important security concept.
Rajni Bhardwaj Grover
 
AI Penetration Testing Essentials: A Cybersecurity Guide for 2025
defencerabbit Team
 
Staying Human in a Machine- Accelerated World
Catalin Jora
 
Future Tech Innovations 2025 – A TechLists Insight
TechLists
 
COMPARISON OF RASTER ANALYSIS TOOLS OF QGIS AND ARCGIS
Sharanya Sarkar
 
Transcript: New from BookNet Canada for 2025: BNC BiblioShare - Tech Forum 2025
BookNet Canada
 
Bitcoin for Millennials podcast with Bram, Power Laws of Bitcoin
Stephen Perrenod
 
LOOPS in C Programming Language - Technology
RishabhDwivedi43
 
Seamless Tech Experiences Showcasing Cross-Platform App Design.pptx
presentifyai
 
“NPU IP Hardware Shaped Through Software and Use-case Analysis,” a Presentati...
Edge AI and Vision Alliance
 
Mastering ODC + Okta Configuration - Chennai OSUG
HathiMaryA
 
New from BookNet Canada for 2025: BNC BiblioShare - Tech Forum 2025
BookNet Canada
 
The Rise of AI and IoT in Mobile App Tech.pdf
IMG Global Infotech
 
The 2025 InfraRed Report - Redpoint Ventures
Razin Mustafiz
 
Newgen Beyond Frankenstein_Build vs Buy_Digital_version.pdf
darshakparmar
 
Transforming Utility Networks: Large-scale Data Migrations with FME
Safe Software
 

Python advanced 2. regular expression in python

  • 1. PYTHON REGULAR EXPRESSIONS John Zhang Tuesday, December 11, 2012
  • 2. Regular Expressions • Regular expressions are a powerful string manipulation tool • All modern languages have similar library packages for regular expressions • Use regular expressions to: – Search a string (search and match) – Replace parts of a string (sub) – Break stings into smaller pieces (split)
  • 3. Regular Expression Python Syntax • regular match: Example: the regular expression “test” only matches the string ‘test’ • [x] matches any one of a list of characters Example: “*abc+” matches ‘a’,‘b’,or ‘c’ • [^x] matches any one character that is not included in x “*^abc+” matches any single character except ‘a’,’b’,or ‘c’
  • 4. Regular Expressions Syntax • “.” matches any single character • Parentheses can be used for grouping by () Example: “(abc)+” matches ’abc’, ‘abcabc’, ‘abcabcabc’, etc. • x|y matches x or y Example: “this|that” matches ‘this’ and ‘that’, but not ‘thisthat’.
  • 5. Regular Expression Syntax • x* matches zero or more x’s “a*” matches ’’, ’a’, ’aa’, etc. • x+ matches one or more x’s “a+” matches ’a’,’aa’,’aaa’, etc. • x? matches zero or one x’s “a?” matches ’’ or ’a’ . • x{m, n} matches i x‘s, where m<i< n “a,2,3-” matches ’aa’ or ’aaa’
  • 6. Regular Expression Syntax • “d” matches any digit; “D” matches any non-digit • “s” matches any whitespace character; “S” matches any non-whitespace character • “w” matches any alphanumeric character; “W” matches any non-alphanumeric character • “^” matches the beginning of the string; “$” matches the end of the string • “b” matches a word boundary; “B” matches position that is not a word boundary
  • 7. Search and Match • The two basic functions are re.search and re.match – Search looks for a pattern anywhere in a string – Match looks for a match staring at the beginning • Both return None if the pattern is not found (logical false) and a “match object” if it is pat = "a*b" import re matchObj = re.search(pat,"fooaaabcde") if matchObj: print “match successfully at %s” % matchObj.group(0)
  • 8. Q: What’s a match object? • A: an instance of the match class with the details of the match result pat = "a*b" >>> r1 = re.search(pat,"fooaaabcde") >>> r1.group() # group returns string matched 'aaab' >>> r1.start() # index of the match start 3 >>> r1.end() # index of the match end 7 >>> r1.span() # tuple of (start, end) (3, 7)
  • 9. What got matched? • Here’s a pattern to match simple email addresses w+@(w+.)+(com|org|net|edu) >>> pat1 = "w+@(w+.)+(com|org|net|edu)" >>> r1 = re.match(pat1,“[email protected]") >>> r1.group() '[email protected]’ • We might want to extract the pattern parts, like the email name and host
  • 10. What got matched? • We can put parentheses around groups we want to be able to reference >>> pat2 = "(w+)@((w+.)+(com|org|net|edu))" >>> r2 = re.match(pat2,"[email protected]") >>> r2.group(1) ‘qzhang' >>> r2.group(2) ‘pku.cn.edu' >>> r2.groups() r2.groups() (‘qzhang', ' pku.cn.edu ', ‘cn.', 'edu’) • Note that the ‘groups’ are numbered in a preorder traversal of the forest
  • 11. What got matched? • We can ‘label’ the groups as well… >>> pat3 ="(?P<name>w+)@(?P<host>(w+.)+(com|org|net|edu))" >>> r3 = re.match(pat3,"[email protected]") >>> r3.group('name') ‘qzhang' >>> r3.group('host') ‘pku.cn.edu’ • And reference the matching parts by the labels
  • 12. More re functions • re.split() is like split but can use patterns >>> re.split("W+", “This... is a test, short and sweet, of split().”) *'This', 'is', 'a', 'test', 'short’, 'and', 'sweet', 'of', 'split’, ‘’+ • re.sub substitutes one string for a pattern >>> re.sub('(blue|white|red)', 'black', 'blue socks and red shoes') 'black socks and black shoes’ • re.findall() finds al matches >>> re.findall("d+”,"12 dogs,11 cats, 1 egg") *'12', '11', ’1’+
  • 13. Compiling regular expressions • If you plan to use a re pattern more than once, compile it to a re object • Python produces a special data structure that speeds up matching >>> capt3 = re.compile(pat3) >>> cpat3 <_sre.SRE_Pattern object at 0x2d9c0> >>> r3 = cpat3.search("[email protected]") >>> r3 <_sre.SRE_Match object at 0x895a0> >>> r3.group() '[email protected]'
  • 14. Pattern object methods • There are methods defined for a pattern object that parallel the regular expression functions, e.g., – match – search – split – findall – sub