SlideShare a Scribd company logo
File operations and data parsing 
Presented by 
Felix Hoffmann 
@Felix11H 
felix11h.github.io/ 
Slides 
Slideshare: 
tiny.cc/file-ops 
Source: 
tiny.cc/file-ops-github 
References 
- Python Files I/O at tutorialspoint.com 
- Dive into Python 3 by Mark Pilgrim 
- Python Documentation on 
match objects 
- Regex tutorial at regexone.com 
This work is licensed under a Creative Commons Attribution 4.0 International License.
File operations: Reading 
Opening an existing file 
>>> f = open("test.txt","rb") 
>>> print f 
<open file ’test.txt’, mode ’rb’ at 0x...> 
Reading it: 
>>> f.read() 
’hello world’ 
Closing it: 
>>> f.close() 
>>> print f 
<closed file ’test.txt’, mode ’rb’ at 0x...>
File operations: Writing 
Opening a (new) file 
>>> f = open("new_test.txt","wb") 
>>> print f 
<open file ’test.txt’, mode ’wb’ at 0x...> 
Writing to it: 
>>> f.write("hello world, again") 
>>> f.write("... and again") 
>>> f.close() 
) Only after calling close() the changes appear in the file for 
editing elsewhere!
File operations: Appending 
Opening an existing file 
>>> f = open("test.txt","ab") 
>>> print f 
<open file ’test.txt’, mode ’ab’ at 0x...> 
Appending to it: 
>>> f.write("hello world, again") 
>>> f.write("... and again") 
>>> f.close() 
) In append mode the file pointer is set to the end of the opened 
file.
File operations: More about file pointers 
1 f = open("lines_test.txt", "wb") 
2 for i in range(10): 
3 f.write("this is line %d n" %(i+1)) 
4 f.close() 
Reading from the file: 
>>> f = open("lines_test.txt", "rb") 
>>> f.readline() 
’this is line 1 n’ 
>>> f.readline() 
’this is line 2 n’ 
>>> f.read(14) 
’this is line 3’ 
>>> f.read(2) 
’ n’
File operations: More about file pointers 
f.tell() gives current position within file f 
f.seek(x[, from]) change file pointer position within 
file f, where 
from = 0 from beginning of file 
from = 1 from current position 
from = 2 from end of file 
1 >>> f = open("lines_test.txt", "rb") 
2 >>> f.tell() 
3 0 
4 >>> f.read(10) 
5 ’this is li’ 
6 >>> f.tell() 
7 10
File operations: More about file pointers 
1 >>> f.seek(5) 
2 >>> f.tell() 
3 5 
4 >>> f.seek(10,1) 
5 >>> f.tell() 
6 15 
7 >>> f.seek(-10,2) 
8 >>> f.tell() 
9 151 
10 >>> f.read() 
11 ’ line 10 n’
File operations: Other Modes 
rb+ Opens the file for reading and writing. File pointer will 
be at the beginning of the file. 
wb+ Opens for reading and writing. Overwrites the existing 
file if the file exists, otherwise a new file is created. 
ab+ Opens the file for appending and reading. The file 
pointer is at the end of the file if the file exists, otherwise 
a new file is created for reading and writing.
Saving Data: Python Pickle 
Use pickle to save and retrieve more complex data types - lists, 
dictionaries and even class objects: 
1 >>> import pickle 
2 >>> f = open(’save_file.p’, ’wb’) 
3 >>> ex_dict = {’hello’: ’world’} 
4 >>> pickle.dump(ex_dict, f) 
5 >>> f.close() 
1 >>> import pickle 
2 >>> f = open(’save_file.p’, ’rb’) 
3 >>> loadobj = pickle.load(f) 
4 >>> print loadobj[’hello’] 
5 world
Best practice: With Statement 
1 import pickle 
2 
3 ex_dict = {’hello’: ’world’} 
4 
5 with open(’save_file.p’, ’wb’) as f: 
6 pickle.dump(ex_dict, f) 
1 import pickle 
2 
3 with open(’save_file.p’, ’rb’) as f: 
4 loadobj = pickle.load(f) 
5 
6 print loadobj[’hello’] 
) Use this!
Need for parsing 
Imagine that 
Data files are 
generated by a third 
party (no control over 
the format) 
& the data files need 
pre-processing 
) Regular expressions 
provide a powerful 
and concise way to 
perform pattern 
match/search/replace 
over the data 
©Randall Munroe xkcd.com CC BY-NC 2.5
Regular expressions - A case study 
Formatting street names 
>>> s = ’100 NORTH MAIN ROAD’ 
>>> s.replace(’ROAD’, ’RD.’) 
’100 NORTH MAIN RD.’ 
>>> s = ’100 NORTH BROAD ROAD’ 
>>> s.replace(’ROAD’, ’RD.’) 
’100 NORTH BRD. RD.’ 
>>> s[:-4] + s[-4:].replace(’ROAD’, ’RD.’) 
’100 NORTH BROAD RD.’ 
Better use regular expressions! 
>>> import re 
>>> re.sub(r’ROAD$’, ’RD.’, s) 
’100 NORTH BROAD RD.’ 
example from Dive Into Python 3 
©Mark Pilgrim CC BY-SA 3.0
Pattern matching with regular expressions 
ˆ Matches beginning of line/pattern 
$ Matches end of line/pattern 
. Matches any character except newline 
[..] Matches any single character in brackets 
[ˆ..] Matches any single character not in brackets 
re* Matches 0 or more occurrences of the preceding 
expression 
re+ Matches 1 or more occurrences of the preceding 
expression 
re? Matches 0 or 1 occurrence 
refng Match exactly n occurrences 
refn,g Match n or more occurrences 
refn,mg Match at least n and at most m 
) Use cheatsheets, trainers, tutorials, builders, etc..
re.search() & matches 
>>> import re 
>>> data = "I like python" 
>>> m = re.search(r’python’,data) 
>>> print m 
<_sre.SRE_Match object at 0x...> 
Important properties of the match object: 
group() Return the string matched by the RE 
start() Return the starting position of the match 
end() Return the ending position of the match 
span() Return a tuple containing the (start, end) positions of 
the match
re.search() & matches 
For example: 
>>> import re 
>>> data = "I like python" 
>>> m = re.search(r’python’,data) 
>>> m.group() 
’python’ 
>>> m.start() 
7 
>>> m.span() 
(7,13) 
For a complete list of match object properties see for example the 
Python Documentation: 
https://docs.python.org/2/library/re.html#match-objects
re.findall() 
>>> import re 
>>> data = "Python is great. I like python" 
>>> m = re.search(r’[pP]ython’,data) 
>>> m.group() 
’Python’ 
) re.search() returns only the first match, use re.findall() instead: 
>>> import re 
>>> data = "Python is great. I like python" 
>>> l = re.findall(r’[pP]ython’,data) 
>>> print l 
[’Python’, ’python’] 
) Returns list instead of match object!
re.findall() - Example 
1 import re 
2 
3 with open("history.txt", "rb") as f: 
4 text = f.read() 
5 
6 year_dates = re.findall(r’19[0-9]{2}’, text)
re.split() 
Suppose the data stream has well-defined delimiter 
>>> data = "x = 20" 
>>> re.split(r’=’,data) 
[’x ’, ’ 20’] 
>>> data = ’ftp://python.about.com’ 
>>> re.split(r’:/{1,3}’, data) 
[’ftp’, ’python.about.com’] 
>>> data = ’25.657’ 
>>> re.split(r’.’,data) 
[’25’, ’657’]
re.sub() 
Replace patterns by other patterns. 
>>> data = "2004-959-559 # my phone number" 
>>> re.sub(r’#.*’,’’,data) 
’2004-959-559 ’ 
A more interesting example: 
>>> data = "2004-959-559" 
>>> re.sub(r’([0-9]*)-([0-9]*)-([0-9]*)’, 
>>> r’3-2-1’, data) 
’559-959-2004’ 
) Groups are captured in parenthesis and referenced in the 
replacement string by n1, n2, ...
os module 
Provides a way of using os dependent functionality: 
os.mkdir() Creates a directory (like mkdir) 
os.chmod() Change the permissions (like chmod) 
os.rename() Rename the old file name with the new file name. 
os.listdir() List the contents of the directory 
os.getcwd() Get the current working directory path 
os.path Submodule for useful functions on pathnames 
For example, list all files in the current directory: 
>>> from os import listdir 
>>> 
>>> for f in listdir("."): 
>>> print f
Have fun! 
Presented by 
Felix Hoffmann 
@Felix11H 
felix11h.github.io/ 
Slides 
Slideshare: 
tiny.cc/file-ops 
Source: 
tiny.cc/file-ops-github 
References 
- Python Files I/O at tutorialspoint.com 
- Dive into Python 3 by Mark Pilgrim 
- Python Documentation on 
match objects 
- Regex tutorial at regexone.com 
This work is licensed under a Creative Commons Attribution 4.0 International License.

More Related Content

What's hot (20)

PDF
Python File Handling | File Operations in Python | Learn python programming |...
Edureka!
 
PDF
Python - Lecture 8
Ravi Kiran Khareedi
 
DOCX
python file handling
jhona2z
 
PPT
File handling
Nilesh Dalvi
 
PPTX
File Handling and Command Line Arguments in C
Mahendra Yadav
 
PPTX
C Programming Unit-5
Vikram Nandini
 
PPTX
Files and file objects (in Python)
PranavSB
 
PPTX
File handling in c
aakanksha s
 
PPT
File handling in C++
Hitesh Kumar
 
PDF
File handling and Dictionaries in python
nitamhaske
 
PPT
File handling in c
David Livingston J
 
PDF
Python file handling
Prof. Dr. K. Adisesha
 
PPTX
File in C language
Manash Kumar Mondal
 
PPT
File handling in c
Vikash Dhal
 
PPTX
Functions in python
Santosh Verma
 
PPT
File handling-c programming language
thirumalaikumar3
 
PPTX
File Handling Python
Akhil Kaushik
 
PPT
File in c
Prabhu Govind
 
PPTX
Files in php
sana mateen
 
PPTX
UNIT 10. Files and file handling in C
Ashim Lamichhane
 
Python File Handling | File Operations in Python | Learn python programming |...
Edureka!
 
Python - Lecture 8
Ravi Kiran Khareedi
 
python file handling
jhona2z
 
File handling
Nilesh Dalvi
 
File Handling and Command Line Arguments in C
Mahendra Yadav
 
C Programming Unit-5
Vikram Nandini
 
Files and file objects (in Python)
PranavSB
 
File handling in c
aakanksha s
 
File handling in C++
Hitesh Kumar
 
File handling and Dictionaries in python
nitamhaske
 
File handling in c
David Livingston J
 
Python file handling
Prof. Dr. K. Adisesha
 
File in C language
Manash Kumar Mondal
 
File handling in c
Vikash Dhal
 
Functions in python
Santosh Verma
 
File handling-c programming language
thirumalaikumar3
 
File Handling Python
Akhil Kaushik
 
File in c
Prabhu Govind
 
Files in php
sana mateen
 
UNIT 10. Files and file handling in C
Ashim Lamichhane
 

Viewers also liked (20)

PDF
Introduction to Sumatra
Felix Z. Hoffmann
 
PPTX
Manipulating file in Python
shoukatali500
 
PPTX
Writing Wireshark Filter Expression For Capturing Packets
Xafran Marwat
 
PDF
EuroPython 2013 - FAST, DOCUMENTED AND RELIABLE JSON BASED WEBSERVICES WITH P...
Alessandro Molina
 
PPT
Creating Custom Drupal Modules
tanoshimi
 
PDF
FLTK Summer Course - Part VIII - Eighth Impact
Michel Alves
 
PDF
TMS - Schedule of Presentations and Reports
Michel Alves
 
PDF
FLTK Summer Course - Part I - First Impact - Exercises
Michel Alves
 
PDF
Using Git on the Command Line
Brian Richards
 
PDF
FLTK Summer Course - Part VI - Sixth Impact - Exercises
Michel Alves
 
PDF
"Git Hooked!" Using Git hooks to improve your software development process
Polished Geek LLC
 
PDF
FLTK Summer Course - Part III - Third Impact
Michel Alves
 
PDF
FLTK Summer Course - Part VII - Seventh Impact
Michel Alves
 
ODP
Servicios web con Python
Manuel Pérez
 
PDF
Advanced Git
Sergiu-Ioan Ungur
 
PDF
FLTK Summer Course - Part II - Second Impact - Exercises
Michel Alves
 
PDF
Code Refactoring - Live Coding Demo (JavaDay 2014)
Peter Kofler
 
PPT
Introduction to Git Commands and Concepts
Carl Brown
 
PDF
FLTK Summer Course - Part II - Second Impact
Michel Alves
 
PDF
Git hooks For PHP Developers
Umut IŞIK
 
Introduction to Sumatra
Felix Z. Hoffmann
 
Manipulating file in Python
shoukatali500
 
Writing Wireshark Filter Expression For Capturing Packets
Xafran Marwat
 
EuroPython 2013 - FAST, DOCUMENTED AND RELIABLE JSON BASED WEBSERVICES WITH P...
Alessandro Molina
 
Creating Custom Drupal Modules
tanoshimi
 
FLTK Summer Course - Part VIII - Eighth Impact
Michel Alves
 
TMS - Schedule of Presentations and Reports
Michel Alves
 
FLTK Summer Course - Part I - First Impact - Exercises
Michel Alves
 
Using Git on the Command Line
Brian Richards
 
FLTK Summer Course - Part VI - Sixth Impact - Exercises
Michel Alves
 
"Git Hooked!" Using Git hooks to improve your software development process
Polished Geek LLC
 
FLTK Summer Course - Part III - Third Impact
Michel Alves
 
FLTK Summer Course - Part VII - Seventh Impact
Michel Alves
 
Servicios web con Python
Manuel Pérez
 
Advanced Git
Sergiu-Ioan Ungur
 
FLTK Summer Course - Part II - Second Impact - Exercises
Michel Alves
 
Code Refactoring - Live Coding Demo (JavaDay 2014)
Peter Kofler
 
Introduction to Git Commands and Concepts
Carl Brown
 
FLTK Summer Course - Part II - Second Impact
Michel Alves
 
Git hooks For PHP Developers
Umut IŞIK
 
Ad

Similar to Python - File operations & Data parsing (20)

PPTX
P3 2018 python_regexes
Prof. Wim Van Criekinge
 
PPTX
Unit V.pptx
ShaswatSurya
 
PPTX
pspp-rsk.pptx
ARYAN552812
 
PPTX
Python lec5
Swarup Ghosh
 
PDF
File handling & regular expressions in python programming
Srinivas Narasegouda
 
PPTX
File Operations in python Read ,Write,binary file etc.
deepalishinkar1
 
PPT
Python File functions
keerthanakommera1
 
PPTX
Python_Unit_III.pptx
ssuserc755f1
 
PPTX
01 file handling for class use class pptx
PreeTVithule1
 
PPTX
P3 2017 python_regexes
Prof. Wim Van Criekinge
 
PPTX
UNIT III PYTHON.pptx python basic ppt ppt
SuganthiDPSGRKCW
 
PPTX
file handling in python using exception statement
srividhyaarajagopal
 
PPTX
Programming in Python
Tiji Thomas
 
PDF
CHAPTER 2 - FILE HANDLING-txtfile.pdf is here
sidbhat290907
 
PDF
Python basic
Saifuddin Kaijar
 
PPTX
FILE HANDLING.pptx
kendriyavidyalayano24
 
DOCX
File Handling in python.docx
manohar25689
 
PPTX
Python programming
saroja20
 
PPT
File Handling Btech computer science and engineering ppt
pinuadarsh04
 
PPTX
Chapter - 5.pptx
MikialeTesfamariam
 
P3 2018 python_regexes
Prof. Wim Van Criekinge
 
Unit V.pptx
ShaswatSurya
 
pspp-rsk.pptx
ARYAN552812
 
Python lec5
Swarup Ghosh
 
File handling & regular expressions in python programming
Srinivas Narasegouda
 
File Operations in python Read ,Write,binary file etc.
deepalishinkar1
 
Python File functions
keerthanakommera1
 
Python_Unit_III.pptx
ssuserc755f1
 
01 file handling for class use class pptx
PreeTVithule1
 
P3 2017 python_regexes
Prof. Wim Van Criekinge
 
UNIT III PYTHON.pptx python basic ppt ppt
SuganthiDPSGRKCW
 
file handling in python using exception statement
srividhyaarajagopal
 
Programming in Python
Tiji Thomas
 
CHAPTER 2 - FILE HANDLING-txtfile.pdf is here
sidbhat290907
 
Python basic
Saifuddin Kaijar
 
FILE HANDLING.pptx
kendriyavidyalayano24
 
File Handling in python.docx
manohar25689
 
Python programming
saroja20
 
File Handling Btech computer science and engineering ppt
pinuadarsh04
 
Chapter - 5.pptx
MikialeTesfamariam
 
Ad

Recently uploaded (20)

PDF
QNL June Edition hosted by Pragya the official Quiz Club of the University of...
Pragya - UEM Kolkata Quiz Club
 
PDF
STATEMENT-BY-THE-HON.-MINISTER-FOR-HEALTH-ON-THE-COVID-19-OUTBREAK-AT-UG_revi...
nservice241
 
PDF
Biological Bilingual Glossary Hindi and English Medium
World of Wisdom
 
PDF
Reconstruct, Restore, Reimagine: New Perspectives on Stoke Newington’s Histor...
History of Stoke Newington
 
PPTX
CATEGORIES OF NURSING PERSONNEL: HOSPITAL & COLLEGE
PRADEEP ABOTHU
 
PDF
Characteristics, Strengths and Weaknesses of Quantitative Research.pdf
Thelma Villaflores
 
PDF
Week 2 - Irish Natural Heritage Powerpoint.pdf
swainealan
 
PPTX
DAY 1_QUARTER1 ENGLISH 5 WEEK- PRESENTATION.pptx
BanyMacalintal
 
PPTX
How to Create a Customer From Website in Odoo 18.pptx
Celine George
 
PPTX
Cultivation practice of Litchi in Nepal.pptx
UmeshTimilsina1
 
PDF
Aprendendo Arquitetura Framework Salesforce - Dia 03
Mauricio Alexandre Silva
 
PDF
epi editorial commitee meeting presentation
MIPLM
 
PPTX
PPT-Q1-WEEK-3-SCIENCE-ERevised Matatag Grade 3.pptx
reijhongidayawan02
 
PDF
Vani - The Voice of Excellence - Jul 2025 issue
Savipriya Raghavendra
 
PPTX
How to Configure Re-Ordering From Portal in Odoo 18 Website
Celine George
 
PDF
Introduction presentation of the patentbutler tool
MIPLM
 
PPTX
Introduction to Biochemistry & Cellular Foundations.pptx
marvinnbustamante1
 
PPTX
DIGITAL CITIZENSHIP TOPIC TLE 8 MATATAG CURRICULUM
ROBERTAUGUSTINEFRANC
 
PPTX
infertility, types,causes, impact, and management
Ritu480198
 
PPTX
HUMAN RESOURCE MANAGEMENT: RECRUITMENT, SELECTION, PLACEMENT, DEPLOYMENT, TRA...
PRADEEP ABOTHU
 
QNL June Edition hosted by Pragya the official Quiz Club of the University of...
Pragya - UEM Kolkata Quiz Club
 
STATEMENT-BY-THE-HON.-MINISTER-FOR-HEALTH-ON-THE-COVID-19-OUTBREAK-AT-UG_revi...
nservice241
 
Biological Bilingual Glossary Hindi and English Medium
World of Wisdom
 
Reconstruct, Restore, Reimagine: New Perspectives on Stoke Newington’s Histor...
History of Stoke Newington
 
CATEGORIES OF NURSING PERSONNEL: HOSPITAL & COLLEGE
PRADEEP ABOTHU
 
Characteristics, Strengths and Weaknesses of Quantitative Research.pdf
Thelma Villaflores
 
Week 2 - Irish Natural Heritage Powerpoint.pdf
swainealan
 
DAY 1_QUARTER1 ENGLISH 5 WEEK- PRESENTATION.pptx
BanyMacalintal
 
How to Create a Customer From Website in Odoo 18.pptx
Celine George
 
Cultivation practice of Litchi in Nepal.pptx
UmeshTimilsina1
 
Aprendendo Arquitetura Framework Salesforce - Dia 03
Mauricio Alexandre Silva
 
epi editorial commitee meeting presentation
MIPLM
 
PPT-Q1-WEEK-3-SCIENCE-ERevised Matatag Grade 3.pptx
reijhongidayawan02
 
Vani - The Voice of Excellence - Jul 2025 issue
Savipriya Raghavendra
 
How to Configure Re-Ordering From Portal in Odoo 18 Website
Celine George
 
Introduction presentation of the patentbutler tool
MIPLM
 
Introduction to Biochemistry & Cellular Foundations.pptx
marvinnbustamante1
 
DIGITAL CITIZENSHIP TOPIC TLE 8 MATATAG CURRICULUM
ROBERTAUGUSTINEFRANC
 
infertility, types,causes, impact, and management
Ritu480198
 
HUMAN RESOURCE MANAGEMENT: RECRUITMENT, SELECTION, PLACEMENT, DEPLOYMENT, TRA...
PRADEEP ABOTHU
 

Python - File operations & Data parsing

  • 1. File operations and data parsing Presented by Felix Hoffmann @Felix11H felix11h.github.io/ Slides Slideshare: tiny.cc/file-ops Source: tiny.cc/file-ops-github References - Python Files I/O at tutorialspoint.com - Dive into Python 3 by Mark Pilgrim - Python Documentation on match objects - Regex tutorial at regexone.com This work is licensed under a Creative Commons Attribution 4.0 International License.
  • 2. File operations: Reading Opening an existing file >>> f = open("test.txt","rb") >>> print f <open file ’test.txt’, mode ’rb’ at 0x...> Reading it: >>> f.read() ’hello world’ Closing it: >>> f.close() >>> print f <closed file ’test.txt’, mode ’rb’ at 0x...>
  • 3. File operations: Writing Opening a (new) file >>> f = open("new_test.txt","wb") >>> print f <open file ’test.txt’, mode ’wb’ at 0x...> Writing to it: >>> f.write("hello world, again") >>> f.write("... and again") >>> f.close() ) Only after calling close() the changes appear in the file for editing elsewhere!
  • 4. File operations: Appending Opening an existing file >>> f = open("test.txt","ab") >>> print f <open file ’test.txt’, mode ’ab’ at 0x...> Appending to it: >>> f.write("hello world, again") >>> f.write("... and again") >>> f.close() ) In append mode the file pointer is set to the end of the opened file.
  • 5. File operations: More about file pointers 1 f = open("lines_test.txt", "wb") 2 for i in range(10): 3 f.write("this is line %d n" %(i+1)) 4 f.close() Reading from the file: >>> f = open("lines_test.txt", "rb") >>> f.readline() ’this is line 1 n’ >>> f.readline() ’this is line 2 n’ >>> f.read(14) ’this is line 3’ >>> f.read(2) ’ n’
  • 6. File operations: More about file pointers f.tell() gives current position within file f f.seek(x[, from]) change file pointer position within file f, where from = 0 from beginning of file from = 1 from current position from = 2 from end of file 1 >>> f = open("lines_test.txt", "rb") 2 >>> f.tell() 3 0 4 >>> f.read(10) 5 ’this is li’ 6 >>> f.tell() 7 10
  • 7. File operations: More about file pointers 1 >>> f.seek(5) 2 >>> f.tell() 3 5 4 >>> f.seek(10,1) 5 >>> f.tell() 6 15 7 >>> f.seek(-10,2) 8 >>> f.tell() 9 151 10 >>> f.read() 11 ’ line 10 n’
  • 8. File operations: Other Modes rb+ Opens the file for reading and writing. File pointer will be at the beginning of the file. wb+ Opens for reading and writing. Overwrites the existing file if the file exists, otherwise a new file is created. ab+ Opens the file for appending and reading. The file pointer is at the end of the file if the file exists, otherwise a new file is created for reading and writing.
  • 9. Saving Data: Python Pickle Use pickle to save and retrieve more complex data types - lists, dictionaries and even class objects: 1 >>> import pickle 2 >>> f = open(’save_file.p’, ’wb’) 3 >>> ex_dict = {’hello’: ’world’} 4 >>> pickle.dump(ex_dict, f) 5 >>> f.close() 1 >>> import pickle 2 >>> f = open(’save_file.p’, ’rb’) 3 >>> loadobj = pickle.load(f) 4 >>> print loadobj[’hello’] 5 world
  • 10. Best practice: With Statement 1 import pickle 2 3 ex_dict = {’hello’: ’world’} 4 5 with open(’save_file.p’, ’wb’) as f: 6 pickle.dump(ex_dict, f) 1 import pickle 2 3 with open(’save_file.p’, ’rb’) as f: 4 loadobj = pickle.load(f) 5 6 print loadobj[’hello’] ) Use this!
  • 11. Need for parsing Imagine that Data files are generated by a third party (no control over the format) & the data files need pre-processing ) Regular expressions provide a powerful and concise way to perform pattern match/search/replace over the data ©Randall Munroe xkcd.com CC BY-NC 2.5
  • 12. Regular expressions - A case study Formatting street names >>> s = ’100 NORTH MAIN ROAD’ >>> s.replace(’ROAD’, ’RD.’) ’100 NORTH MAIN RD.’ >>> s = ’100 NORTH BROAD ROAD’ >>> s.replace(’ROAD’, ’RD.’) ’100 NORTH BRD. RD.’ >>> s[:-4] + s[-4:].replace(’ROAD’, ’RD.’) ’100 NORTH BROAD RD.’ Better use regular expressions! >>> import re >>> re.sub(r’ROAD$’, ’RD.’, s) ’100 NORTH BROAD RD.’ example from Dive Into Python 3 ©Mark Pilgrim CC BY-SA 3.0
  • 13. Pattern matching with regular expressions ˆ Matches beginning of line/pattern $ Matches end of line/pattern . Matches any character except newline [..] Matches any single character in brackets [ˆ..] Matches any single character not in brackets re* Matches 0 or more occurrences of the preceding expression re+ Matches 1 or more occurrences of the preceding expression re? Matches 0 or 1 occurrence refng Match exactly n occurrences refn,g Match n or more occurrences refn,mg Match at least n and at most m ) Use cheatsheets, trainers, tutorials, builders, etc..
  • 14. re.search() & matches >>> import re >>> data = "I like python" >>> m = re.search(r’python’,data) >>> print m <_sre.SRE_Match object at 0x...> Important properties of the match object: group() Return the string matched by the RE start() Return the starting position of the match end() Return the ending position of the match span() Return a tuple containing the (start, end) positions of the match
  • 15. re.search() & matches For example: >>> import re >>> data = "I like python" >>> m = re.search(r’python’,data) >>> m.group() ’python’ >>> m.start() 7 >>> m.span() (7,13) For a complete list of match object properties see for example the Python Documentation: https://docs.python.org/2/library/re.html#match-objects
  • 16. re.findall() >>> import re >>> data = "Python is great. I like python" >>> m = re.search(r’[pP]ython’,data) >>> m.group() ’Python’ ) re.search() returns only the first match, use re.findall() instead: >>> import re >>> data = "Python is great. I like python" >>> l = re.findall(r’[pP]ython’,data) >>> print l [’Python’, ’python’] ) Returns list instead of match object!
  • 17. re.findall() - Example 1 import re 2 3 with open("history.txt", "rb") as f: 4 text = f.read() 5 6 year_dates = re.findall(r’19[0-9]{2}’, text)
  • 18. re.split() Suppose the data stream has well-defined delimiter >>> data = "x = 20" >>> re.split(r’=’,data) [’x ’, ’ 20’] >>> data = ’ftp://python.about.com’ >>> re.split(r’:/{1,3}’, data) [’ftp’, ’python.about.com’] >>> data = ’25.657’ >>> re.split(r’.’,data) [’25’, ’657’]
  • 19. re.sub() Replace patterns by other patterns. >>> data = "2004-959-559 # my phone number" >>> re.sub(r’#.*’,’’,data) ’2004-959-559 ’ A more interesting example: >>> data = "2004-959-559" >>> re.sub(r’([0-9]*)-([0-9]*)-([0-9]*)’, >>> r’3-2-1’, data) ’559-959-2004’ ) Groups are captured in parenthesis and referenced in the replacement string by n1, n2, ...
  • 20. os module Provides a way of using os dependent functionality: os.mkdir() Creates a directory (like mkdir) os.chmod() Change the permissions (like chmod) os.rename() Rename the old file name with the new file name. os.listdir() List the contents of the directory os.getcwd() Get the current working directory path os.path Submodule for useful functions on pathnames For example, list all files in the current directory: >>> from os import listdir >>> >>> for f in listdir("."): >>> print f
  • 21. Have fun! Presented by Felix Hoffmann @Felix11H felix11h.github.io/ Slides Slideshare: tiny.cc/file-ops Source: tiny.cc/file-ops-github References - Python Files I/O at tutorialspoint.com - Dive into Python 3 by Mark Pilgrim - Python Documentation on match objects - Regex tutorial at regexone.com This work is licensed under a Creative Commons Attribution 4.0 International License.