SlideShare a Scribd company logo
Regular Expressions
  How not to turn one problem into two.




                       Carl Brown
                       CarlB@PDAgent.com
“Common Wisdom”

   “Some people, when confronted
   with a problem, think ‘I know, I'll
   use regular expressions.’ Now
      they have two problems.”



*See http://regex.info/blog/2006-09-15/247 for source.
What is a ‘Regular
  Expression’?
“...a concise and flexible means for
‘matching’ (specifying and recognizing) strings
of text, such as particular characters, words,
or patterns of characters” (So says Wikipedia)
“... a way of extracting substrings from text in a
‘usefully fuzzy’ way” (So says me)
...so for example?

Pull out the host from a URL string:
  http://([^/]*)/
find the date in a string
  ([0-9][0-9]*[-/][0-9][0-9]*[-/][0-9][0-9]*)
But they’re a Pain to
        Use
       Aren’t they?
Two Kinds of (OOish)
    Languages
 Some languages, Like perl or ruby, have
 Regex build into their strings, so they get used
 often.
 Most others, like Cocoa, Java, Python have
 Regular Expression Objects, that are
 complicated and a Pain in the Ass
Ruby


string.sub(“pattern”,“replacement”)
Cocoa (Apple)

+[NSRegularExpression regularExpressionWithPattern:(NSString *)
pattern options:(NSRegularExpressionOptions)options error:(NSError
**) error]

-[NSRegularExpression replaceMatchesInString:(NSMutableString *)
string options:(NSMatchingOptions)options range:(NSRange)range
withTemplate:(NSString *)template]
Cocoa (Apple)
+[NSRegularExpression regularExpressionWithPattern:(NSString *)
pattern options:(NSRegularExpressionOptions)options error:(NSError
**) error]

-[NSRegularExpression replaceMatchesInString:(NSMutableString *)
string options:(NSMatchingOptions)options range:(NSRange)range
withTemplate:(NSString *)template]


                    NSRegularExpressionOptions?

                        NSMatchingOptions?

                      Why do I need a Range?

                      What’s a template string?
Cocoa (Apple)
+[NSRegularExpression regularExpressionWithPattern:(NSString *)
pattern options:(NSRegularExpressionOptions)options error:(NSError
**) error]

-[NSRegularExpression replaceMatchesInString:(NSMutableString *)
string options:(NSMatchingOptions)options range:(NSRange)range
withTemplate:(NSString *)template]


                    NSRegularExpressionOptions?

                        NSMatchingOptions?

                      Why do I need a Range?

                      What’s a template string?

                   Is it really worth it?
Cocoa (sane)


 #import "NSString+PDRegex.h"

 [string stringByReplacingRegexPattern:@"pattern"
 withString:@"replacement" caseInsensitive:NO];




*See https://github.com/carlbrown/RegexOnNSString/
Python
              (an aside)


import re

re.match(“pattern”,“a pattern”) #no match

re.search(“pattern”,“a pattern”) #matches fine
But Regex’s are
impossible to maintain...
         Aren’t they?
But what about?

(?<!(=)|(="")|(='))(((http|ftp|
https)://)|(www.))+[w]+(.[w]+)
([w-.@?^=%&amp;:/~+#]*[w-@?
^=%&amp;/~+#])?(?!.*/a>)
But what about?
(?<!(=)|(="")|(='))(((http|ftp|
https)://)|(www.))+[w]+(.[w]+)
([w-.@?^=%&amp;:/~+#]*[w-@?
^=%&amp;/~+#])?(?!.*/a>)




   *That* guy has two problems
But what about?
    (?<!(=)|(="")|(='))(((http|ftp|
    https)://)|(www.))+[w]+(.[w]+)
    ([w-.@?^=%&amp;:/~+#]*[w-@?
    ^=%&amp;/~+#])?(?!.*/a>)

        *That* guy has two problems

   Well, Actually, he has n! problems where,
n is the number of hyperlinks in the input string
How to keep that from
happening (my advice)
 Limit yourself to only the basic meta-
 characters.
 Favor clarity over brevity.
 Take more smaller bites.
 Beware of greedy matching
The Basic Characters
       A Phrasebook
PhraseBook pt 1
PhraseBook pt 1
^.*
 “the junk to the left of what I want”
 This breaks down as ^ (the beginning of the string)
 followed by .* any number of any character.
PhraseBook pt 1
^.*
 “the junk to the left of what I want”
 This breaks down as ^ (the beginning of the string)
 followed by .* any number of any character.
.*$
 “the junk to the right of what I want”
 This breaks down as any number of any character .*
 followed by $ (the end of the string)
PhraseBook pt 2
[0–9][0–9]*
 “a number with at least one digit”
 The brackets ([ and ]) mean “any of the characters contained
 within the brackets”. So this means 1 character of 0–9 (so 0 1 2
 3 4 5 6 7 8 or 9) followed by zero or more of the same character.
PhraseBook pt 2
[0–9][0–9]*
 “a number with at least one digit”
 The brackets ([ and ]) mean “any of the characters contained
 within the brackets”. So this means 1 character of 0–9 (so 0 1 2
 3 4 5 6 7 8 or 9) followed by zero or more of the same character.

[^A-Za-z]
 “any character that’s not a letter”
 The ^ as the first character inside the brackets means “not” so
 instead of meaning “any letter” it means “anything not a letter”.
PhraseBook pt 3
.
 “a literal period” (e.g. to match the dot in .com)
PhraseBook pt 3
.
 “a literal period” (e.g. to match the dot in .com)

*
 “a literal * ” (e.g. to match an asterisk)
PhraseBook pt 3
.
 “a literal period” (e.g. to match the dot in .com)

*
 “a literal * ” (e.g. to match an asterisk)

( ) or [ ]
 “literal parenthesis/brackets” (in Cocoa, at least)
PhraseBook pt 3
.
 “a literal period” (e.g. to match the dot in .com)

*
 “a literal * ” (e.g. to match an asterisk)

( ) or [ ]
 “literal parenthesis/brackets” (in Cocoa, at least)

( …stuff… )
 “stuff I want to refer to later as $1” (in Cocoa, at least)
PhraseBook pt 4
PhraseBook pt 4
    There is no...


       Part 4
But what about?
* Cheat Sheet from http://www.addedbytes.com/cheat-sheets/regular-expressions-cheat-sheet/
But what about?
There is no...

       Part 4

But what about?
Clarity > Brevity
 (Really true of any language)
Choose the clearest
      way:

[A-Za-z_] instead of w

[^A-Za-z_] instead of W
Choose the
consistent way:
Choose the
    consistent way:
OSX:~$ grep '^root::*' /etc/passwd

         root:*:0:0:System Administrator:/var/root:/bin/sh

OSX:~$ grep '^root:+' /etc/passwd

OSX:~$
Choose the
     consistent way:
OSX:~$ grep '^root::*' /etc/passwd

         root:*:0:0:System Administrator:/var/root:/bin/sh

OSX:~$ grep '^root:+' /etc/passwd

OSX:~$


OSX:~$ grep '^root:.*' /etc/passwd

root:*:0:0:System Administrator:/var/root:/bin/sh

OSX:~$ grep '^root:.*?' /etc/passwd

OSX:~$
Except when you
      can’t

      ([^/][^]*)/ => 1
http://                       (POSIX/sed)


      ([^/][^]*)/ => $1
http://                   (perl/cocoa)
Take Smaller Bites
The less you do at a time, the safer each step is
Which is clearer?

NSString *domainName = [myHTMLString
stringByReplacingRegexPattern:
@"^.*href=[”’]http://(.*)/.*$"
withString:@"$1" caseInsensitive:YES];
Which is clearer?
   NSString *leftRemoved = [myHTMLString
   stringByReplacingRegexPattern: @"^.*href=[‘“]"
   withString:@"" caseInsensitive:YES];

   NSString *myURL = [leftRemoved
   stringByReplacingRegexPattern: @"[“‘].*$" withString:@""
   caseInsensitive:NO];
   NSString *hostAndPath = [myURL
   stringByReplacingRegexPattern: @"^.*http://"
   withString:@"" caseInsensitive:YES];

   NSString *domainName = [hostAndPath
   stringByReplacingRegexPattern: @"/.*$" withString:@""
   caseInsensitive:NO];

Bonus: This one can be stepped through with the debugger :-)
But isn’t that slower?


 Yes.
But isn’t that slower?


 Yes.
 But it doesn’t matter how fast you get the
 wrong answer.
Beware Greedy
    Matching
Remember this?
 NSString *domainName = [myHTMLString
 stringByReplacingRegexPattern:
 @"^.*href=[”’]http://(.*)/.*$" withString:@"$1"
 caseInsensitive:YES];
Beware Greedy
     Matching
Remember this?
  NSString *domainName = [myHTMLString
  stringByReplacingRegexPattern:
  @"^.*href=[”’]http://(.*)/.*$" withString:@"$1"
  caseInsensitive:YES];

What does it do if given:
  <a href=“http://1.example.com/”>This is a link</
  a> but <a href=“http://2.example.com/”>This is a
  link, too.</a>
Beware Greedy
     Matching
Remember this?
  NSString *domainName = [myHTMLString
  stringByReplacingRegexPattern:
  @"^.*href=[”’]http://(.*)/.*$" withString:@"$1"
  caseInsensitive:YES];

What does it do if given:
  <a href=“http://1.example.com/”>This is a link</
  a> but <a href=“http://2.example.com/”>This is a
  link, too.</a>
What you meant was:

 After ‘http://’ up to but not including the next ‘/’
What you meant was:

 After ‘http://’ up to but not including the next ‘/’
 Which is:

   http://([^/][^/]*)/
Remember this?
    (?<!(=)|(="")|(='))(((http|ftp|
    https)://)|(www.))+[w]+(.[w]+)
    ([w-.@?^=%&amp;:/~+#]*[w-@?
    ^=%&amp;/~+#])?(?!.*/a>)



   Well, Actually, he has n! problems where,
n is the number of hyperlinks in the input string
So if you had
<p>Today’s Links:</p>
<UL>
    <LI><A HREF=”http://example.com/1”>Link 1</A></LI>
    <LI><A HREF=”http://example.com/2”>Link 2</A></LI>
    <LI><A HREF=”http://example.com/3”>Link 3</A></LI>
    <LI><A HREF=”http://example.com/4”>Link 4</A></LI>
    <LI><A HREF=”http://example.com/5”>Link 5</A></LI>
    <LI><A HREF=”http://example.com/6”>Link 6</A></LI>
</UL>
And tried to use:
(?<!(=)|(="")|(='))(((http|ftp|
https)://)|(www.))+[w]+(.[w]+)
([w-.@?^=%&amp;:/~+#]*[w-@?
^=%&amp;/~+#])?(?!.*/a>)
It would have to:
<p>Today’s Links:</p>
<UL>
    <LI><A HREF=”http://example.com/1”>Link 1</A></LI>
    <LI><A HREF=”http://example.com/2”>Link 2</A></LI>
    <LI><A HREF=”http://example.com/3”>Link 3</A></LI>
    <LI><A HREF=”http://example.com/4”>Link 4</A></LI>
    <LI><A HREF=”http://example.com/5”>Link 5</A></LI>
    <LI><A HREF=”http://example.com/6”>Link 6</A></LI>
</UL>
And then:
<p>Today’s Links:</p>
<UL>
    <LI><A HREF=”http://example.com/1”>Link 1</A></LI>
    <LI><A HREF=”http://example.com/2”>Link 2</A></LI>
    <LI><A HREF=”http://example.com/3”>Link 3</A></LI>
    <LI><A HREF=”http://example.com/4”>Link 4</A></LI>
    <LI><A HREF=”http://example.com/5”>Link 5</A></LI>
    <LI><A HREF=”http://example.com/6”>Link 6</A></LI>
</UL>
And then:
<p>Today’s Links:</p>
<UL>
    <LI><A HREF=”http://example.com/1”>Link 1</A></LI>
    <LI><A HREF=”http://example.com/2”>Link 2</A></LI>
    <LI><A HREF=”http://example.com/3”>Link 3</A></LI>
    <LI><A HREF=”http://example.com/4”>Link 4</A></LI>
    <LI><A HREF=”http://example.com/5”>Link 5</A></LI>
    <LI><A HREF=”http://example.com/6”>Link 6</A></LI>
</UL>
And then:
<p>Today’s Links:</p>
<UL>
    <LI><A HREF=”http://example.com/1”>Link 1</A></LI>
    <LI><A HREF=”http://example.com/2”>Link 2</A></LI>
    <LI><A HREF=”http://example.com/3”>Link 3</A></LI>
    <LI><A HREF=”http://example.com/4”>Link 4</A></LI>
    <LI><A HREF=”http://example.com/5”>Link 5</A></LI>
    <LI><A HREF=”http://example.com/6”>Link 6</A></LI>
</UL>
And then:
<p>Today’s Links:</p>
<UL>
    <LI><A HREF=”http://example.com/1”>Link 1</A></LI>
    <LI><A HREF=”http://example.com/2”>Link 2</A></LI>
    <LI><A HREF=”http://example.com/3”>Link 3</A></LI>
    <LI><A HREF=”http://example.com/4”>Link 4</A></LI>
    <LI><A HREF=”http://example.com/5”>Link 5</A></LI>
    <LI><A HREF=”http://example.com/6”>Link 6</A></LI>
</UL>
And then:
<p>Today’s Links:</p>
<UL>
    <LI><A HREF=”http://example.com/1”>Link 1</A></LI>
    <LI><A HREF=”http://example.com/2”>Link 2</A></LI>
    <LI><A HREF=”http://example.com/3”>Link 3</A></LI>
    <LI><A HREF=”http://example.com/4”>Link 4</A></LI>
    <LI><A HREF=”http://example.com/5”>Link 5</A></LI>
    <LI><A HREF=”http://example.com/6”>Link 6</A></LI>
</UL>
And then:
<p>Today’s Links:</p>
<UL>
    <LI><A HREF=”http://example.com/1”>Link 1</A></LI>
    <LI><A HREF=”http://example.com/2”>Link 2</A></LI>
    <LI><A HREF=”http://example.com/3”>Link 3</A></LI>
    <LI><A HREF=”http://example.com/4”>Link 4</A></LI>
    <LI><A HREF=”http://example.com/5”>Link 5</A></LI>
    <LI><A HREF=”http://example.com/6”>Link 6</A></LI>
</UL>
And so on:
<p>Today’s Links:</p>
<UL>
    <LI><A HREF=”http://example.com/1”>Link 1</A></LI>
    <LI><A HREF=”http://example.com/2”>Link 2</A></LI>
    <LI><A HREF=”http://example.com/3”>Link 3</A></LI>
    <LI><A HREF=”http://example.com/4”>Link 4</A></LI>
    <LI><A HREF=”http://example.com/5”>Link 5</A></LI>
    <LI><A HREF=”http://example.com/6”>Link 6</A></LI>
</UL>
But what are they
    good for?
Encoding/decoding metadata from image file
names.
But what are they
    good for?
Encoding/decoding metadata from image file
names.
Renaming files on the command line (@2x?)
But what are they
                 good for?
            Encoding/decoding metadata from image file
            names.
            Renaming files on the command line (@2x?)
            Grabbing the user’s first name from a Full
            Name string (careful of Locales*)




*See http://www.kalzumeus.com/2010/06/17/falsehoods-programmers-believe-about-names/
But what are they
    good for?
Encoding/decoding metadata from image file
names.
Renaming files on the command line (@2x?)
Grabbing the user’s first name from a Full
Name string (careful of Locales)
Stripping crap I don’t want out of user input
(trailing spaces, anyone?)
But what are they
    good for?
Encoding/decoding metadata from image file
names.
Renaming files on the command line (@2x?)
Grabbing the user’s first name from a Full
Name string (careful of Locales)
Stripping crap I don’t want out of user input
(trailing spaces, anyone?)
//.*[.* *release *] *;
Questions?
      CarlB@PDAgent.com

        @CarlAllenBrown

 www.escortmissions.com (Blog)

  www.PDAgent.com (Company)

   https://github.com/carlbrown

http://www.slideshare.net/carlbrown

More Related Content

What's hot (20)

PPTX
Python Workshop
Assem CHELLI
 
PDF
Template Haskell
Sergey Stretovich
 
PDF
And now you have two problems. Ruby regular expressions for fun and profit by...
Codemotion
 
PDF
3.2 javascript regex
Jalpesh Vasa
 
PPT
PHP - Introduction to String Handling
Vibrant Technologies & Computers
 
PDF
DBIx::Class introduction - 2010
leo lapworth
 
PPTX
Learn python - for beginners - part-2
RajKumar Rampelli
 
PDF
Idiomatic Javascript (ES5 to ES2015+)
David Atchley
 
KEY
Exhibition of Atrocity
Michael Pirnat
 
PDF
Template Haskell Tutorial
kizzx2
 
PDF
Is Haskell an acceptable Perl?
osfameron
 
PPTX
Haste (Same Language, Multiple Platforms) and Tagless Final Style (Same Synta...
takeoutweight
 
ODP
The bones of a nice Python script
saniac
 
PDF
Python fundamentals - basic | WeiYuan
Wei-Yuan Chang
 
KEY
1 the ruby way
Luis Doubrava
 
KEY
groovy & grails - lecture 3
Alexandre Masselot
 
PDF
Bioinformatica: Esercizi su Perl, espressioni regolari e altre amenità (BMR G...
Andrea Telatin
 
PDF
Design Patterns - Compiler Case Study - Hands-on Examples
Ganesh Samarthyam
 
ODP
Intermediate Perl
Dave Cross
 
PPT
Functional Pe(a)rls version 2
osfameron
 
Python Workshop
Assem CHELLI
 
Template Haskell
Sergey Stretovich
 
And now you have two problems. Ruby regular expressions for fun and profit by...
Codemotion
 
3.2 javascript regex
Jalpesh Vasa
 
PHP - Introduction to String Handling
Vibrant Technologies & Computers
 
DBIx::Class introduction - 2010
leo lapworth
 
Learn python - for beginners - part-2
RajKumar Rampelli
 
Idiomatic Javascript (ES5 to ES2015+)
David Atchley
 
Exhibition of Atrocity
Michael Pirnat
 
Template Haskell Tutorial
kizzx2
 
Is Haskell an acceptable Perl?
osfameron
 
Haste (Same Language, Multiple Platforms) and Tagless Final Style (Same Synta...
takeoutweight
 
The bones of a nice Python script
saniac
 
Python fundamentals - basic | WeiYuan
Wei-Yuan Chang
 
1 the ruby way
Luis Doubrava
 
groovy & grails - lecture 3
Alexandre Masselot
 
Bioinformatica: Esercizi su Perl, espressioni regolari e altre amenità (BMR G...
Andrea Telatin
 
Design Patterns - Compiler Case Study - Hands-on Examples
Ganesh Samarthyam
 
Intermediate Perl
Dave Cross
 
Functional Pe(a)rls version 2
osfameron
 

Viewers also liked (20)

PDF
Translation of The Noble Quran In The Farsi / Persian Language
The Choice
 
PDF
36.easy french phrase book
Hằng Đào
 
PPTX
NAPS 2016 Jimmy Mello - Speak in 90 Days: Speaking, Fluency and Proficiency
LangFest
 
DOCX
Vocabulary Lists for the SAT & Academic Sucess
Ryan Frank
 
DOCX
Spoken english
mistimanas
 
PDF
Mini Talks Phrasebook
Joanna Soltysiak
 
PDF
Fifteen thousand useful phrases
BeeLast
 
PDF
Barrons wordlist
Ali Gholami
 
PPT
Pet and Ket explanation for students in Year 6
LuciaAbalos
 
PDF
Eko phrases
Hamed Hashemian
 
PDF
Berlitz Tip - Telephoning in English
Berlitz Corporation
 
PDF
Your French Phrasebook
Aga Marchewka
 
PDF
Travel Smarter : Tips Before You Go
Alexandra Arrivillaga
 
PDF
A Guide for Training Public Dialogue Facilitators
Everyday Democracy
 
PDF
7 bg ru-en basic grammar phrasebook
university of pamplona
 
PPTX
Useful phrase for presentation
Anocha Suphawakul
 
PPTX
Speaking reference
Gema Jl
 
PPTX
Besig workshop
Olga Sergeeva
 
PPT
Dialogue in the Classroom
itpdsandy
 
PPT
British English & American English
tracy_su
 
Translation of The Noble Quran In The Farsi / Persian Language
The Choice
 
36.easy french phrase book
Hằng Đào
 
NAPS 2016 Jimmy Mello - Speak in 90 Days: Speaking, Fluency and Proficiency
LangFest
 
Vocabulary Lists for the SAT & Academic Sucess
Ryan Frank
 
Spoken english
mistimanas
 
Mini Talks Phrasebook
Joanna Soltysiak
 
Fifteen thousand useful phrases
BeeLast
 
Barrons wordlist
Ali Gholami
 
Pet and Ket explanation for students in Year 6
LuciaAbalos
 
Eko phrases
Hamed Hashemian
 
Berlitz Tip - Telephoning in English
Berlitz Corporation
 
Your French Phrasebook
Aga Marchewka
 
Travel Smarter : Tips Before You Go
Alexandra Arrivillaga
 
A Guide for Training Public Dialogue Facilitators
Everyday Democracy
 
7 bg ru-en basic grammar phrasebook
university of pamplona
 
Useful phrase for presentation
Anocha Suphawakul
 
Speaking reference
Gema Jl
 
Besig workshop
Olga Sergeeva
 
Dialogue in the Classroom
itpdsandy
 
British English & American English
tracy_su
 
Ad

Similar to Using Regular Expressions and Staying Sane (20)

PPTX
Learning Regular Expression Basics v1
Mohamed Alaa El-Din
 
PDF
Regex - Regular Expression Basics
Eterna Han Tsai
 
PPT
Introduction to Regular Expressions RootsTech 2013
Ben Brumfield
 
KEY
Regular expressions
James Gray
 
PPTX
Regular Expressions Boot Camp
Chris Schiffhauer
 
PDF
/Regex makes me want to (weep_give up_(╯°□°)╯︵ ┻━┻)/i (for 2024 CascadiaPHP)
brettflorio
 
PPTX
NLP_KASHK:Regular Expressions
Hemantha Kulathilake
 
PDF
Regular expressions
davidfstr
 
PPTX
Regular Expression Crash Course
Imran Qasim
 
PDF
Regular expression for everyone
Sanjeev Kumar Jaiswal
 
PPT
Textpad and Regular Expressions
OCSI
 
PPTX
Regular Expression (Regex) Fundamentals
Mesut Günes
 
KEY
Andrei's Regex Clinic
Andrei Zmievski
 
PPT
Perl Intro 5 Regex Matches And Substitutions
Shaun Griffith
 
PPT
regular-expressions lecture 28-string regular expression
smallboss311
 
ODP
Regular Expressions and You
James Armes
 
PDF
Basta mastering regex power
Max Kleiner
 
PPT
Chapter-three automata and complexity theory.ppt
anwarkade1
 
PPTX
Regular expressions
Ignaz Wanders
 
ZIP
Advanced Regular Expressions Redux
Jakub Nesetril
 
Learning Regular Expression Basics v1
Mohamed Alaa El-Din
 
Regex - Regular Expression Basics
Eterna Han Tsai
 
Introduction to Regular Expressions RootsTech 2013
Ben Brumfield
 
Regular expressions
James Gray
 
Regular Expressions Boot Camp
Chris Schiffhauer
 
/Regex makes me want to (weep_give up_(╯°□°)╯︵ ┻━┻)/i (for 2024 CascadiaPHP)
brettflorio
 
NLP_KASHK:Regular Expressions
Hemantha Kulathilake
 
Regular expressions
davidfstr
 
Regular Expression Crash Course
Imran Qasim
 
Regular expression for everyone
Sanjeev Kumar Jaiswal
 
Textpad and Regular Expressions
OCSI
 
Regular Expression (Regex) Fundamentals
Mesut Günes
 
Andrei's Regex Clinic
Andrei Zmievski
 
Perl Intro 5 Regex Matches And Substitutions
Shaun Griffith
 
regular-expressions lecture 28-string regular expression
smallboss311
 
Regular Expressions and You
James Armes
 
Basta mastering regex power
Max Kleiner
 
Chapter-three automata and complexity theory.ppt
anwarkade1
 
Regular expressions
Ignaz Wanders
 
Advanced Regular Expressions Redux
Jakub Nesetril
 
Ad

More from Carl Brown (20)

PDF
GDPR, User Data, Privacy, and Your Apps
Carl Brown
 
PDF
New in iOS 11.3b4 and Xcode 9.3b4
Carl Brown
 
PDF
Managing Memory in Swift (Yes, that's a thing)
Carl Brown
 
PDF
Better Swift from the Foundation up #tryswiftnyc17 09-06
Carl Brown
 
PDF
Generics, the Swift ABI and you
Carl Brown
 
PDF
Swift GUI Development without Xcode
Carl Brown
 
PDF
what's new in iOS10 2016-06-23
Carl Brown
 
PDF
Open Source Swift: Up and Running
Carl Brown
 
PDF
Parse migration CocoaCoders April 28th, 2016
Carl Brown
 
PDF
Swift 2.2 Design Patterns CocoaConf Austin 2016
Carl Brown
 
PDF
Advanced, Composable Collection Views, From CocoaCoders meetup Austin Feb 12,...
Carl Brown
 
PDF
Gcd cc-150205
Carl Brown
 
PDF
Cocoa coders 141113-watch
Carl Brown
 
PDF
iOS8 and the new App Store
Carl Brown
 
PDF
Dark Art of Software Estimation 360iDev2014
Carl Brown
 
PDF
Intro to cloud kit Cocoader.org 24 July 2014
Carl Brown
 
PDF
Welcome to Swift (CocoaCoder 6/12/14)
Carl Brown
 
PDF
Writing Apps that Can See: Getting Data from CoreImage to Computer Vision - ...
Carl Brown
 
PPT
Introduction to Git Commands and Concepts
Carl Brown
 
PDF
REST/JSON/CoreData Example Code - A Tour
Carl Brown
 
GDPR, User Data, Privacy, and Your Apps
Carl Brown
 
New in iOS 11.3b4 and Xcode 9.3b4
Carl Brown
 
Managing Memory in Swift (Yes, that's a thing)
Carl Brown
 
Better Swift from the Foundation up #tryswiftnyc17 09-06
Carl Brown
 
Generics, the Swift ABI and you
Carl Brown
 
Swift GUI Development without Xcode
Carl Brown
 
what's new in iOS10 2016-06-23
Carl Brown
 
Open Source Swift: Up and Running
Carl Brown
 
Parse migration CocoaCoders April 28th, 2016
Carl Brown
 
Swift 2.2 Design Patterns CocoaConf Austin 2016
Carl Brown
 
Advanced, Composable Collection Views, From CocoaCoders meetup Austin Feb 12,...
Carl Brown
 
Gcd cc-150205
Carl Brown
 
Cocoa coders 141113-watch
Carl Brown
 
iOS8 and the new App Store
Carl Brown
 
Dark Art of Software Estimation 360iDev2014
Carl Brown
 
Intro to cloud kit Cocoader.org 24 July 2014
Carl Brown
 
Welcome to Swift (CocoaCoder 6/12/14)
Carl Brown
 
Writing Apps that Can See: Getting Data from CoreImage to Computer Vision - ...
Carl Brown
 
Introduction to Git Commands and Concepts
Carl Brown
 
REST/JSON/CoreData Example Code - A Tour
Carl Brown
 

Recently uploaded (20)

PPTX
Introduction to Flutter by Ayush Desai.pptx
ayushdesai204
 
PPTX
Simple and concise overview about Quantum computing..pptx
mughal641
 
PPTX
Agile Chennai 18-19 July 2025 | Emerging patterns in Agentic AI by Bharani Su...
AgileNetwork
 
PDF
How Open Source Changed My Career by abdelrahman ismail
a0m0rajab1
 
PPTX
Applied-Statistics-Mastering-Data-Driven-Decisions.pptx
parmaryashparmaryash
 
PPTX
Agile Chennai 18-19 July 2025 Ideathon | AI Powered Microfinance Literacy Gui...
AgileNetwork
 
PPTX
The Future of AI & Machine Learning.pptx
pritsen4700
 
PDF
Generative AI vs Predictive AI-The Ultimate Comparison Guide
Lily Clark
 
PDF
Make GenAI investments go further with the Dell AI Factory
Principled Technologies
 
PDF
MASTERDECK GRAPHSUMMIT SYDNEY (Public).pdf
Neo4j
 
PDF
Trying to figure out MCP by actually building an app from scratch with open s...
Julien SIMON
 
PDF
Market Insight : ETH Dominance Returns
CIFDAQ
 
PDF
Build with AI and GDG Cloud Bydgoszcz- ADK .pdf
jaroslawgajewski1
 
PPTX
Farrell_Programming Logic and Design slides_10e_ch02_PowerPoint.pptx
bashnahara11
 
PDF
Research-Fundamentals-and-Topic-Development.pdf
ayesha butalia
 
PDF
Per Axbom: The spectacular lies of maps
Nexer Digital
 
PPTX
Agentic AI in Healthcare Driving the Next Wave of Digital Transformation
danielle hunter
 
PDF
OFFOFFBOX™ – A New Era for African Film | Startup Presentation
ambaicciwalkerbrian
 
PPTX
What-is-the-World-Wide-Web -- Introduction
tonifi9488
 
PPTX
AI and Robotics for Human Well-being.pptx
JAYMIN SUTHAR
 
Introduction to Flutter by Ayush Desai.pptx
ayushdesai204
 
Simple and concise overview about Quantum computing..pptx
mughal641
 
Agile Chennai 18-19 July 2025 | Emerging patterns in Agentic AI by Bharani Su...
AgileNetwork
 
How Open Source Changed My Career by abdelrahman ismail
a0m0rajab1
 
Applied-Statistics-Mastering-Data-Driven-Decisions.pptx
parmaryashparmaryash
 
Agile Chennai 18-19 July 2025 Ideathon | AI Powered Microfinance Literacy Gui...
AgileNetwork
 
The Future of AI & Machine Learning.pptx
pritsen4700
 
Generative AI vs Predictive AI-The Ultimate Comparison Guide
Lily Clark
 
Make GenAI investments go further with the Dell AI Factory
Principled Technologies
 
MASTERDECK GRAPHSUMMIT SYDNEY (Public).pdf
Neo4j
 
Trying to figure out MCP by actually building an app from scratch with open s...
Julien SIMON
 
Market Insight : ETH Dominance Returns
CIFDAQ
 
Build with AI and GDG Cloud Bydgoszcz- ADK .pdf
jaroslawgajewski1
 
Farrell_Programming Logic and Design slides_10e_ch02_PowerPoint.pptx
bashnahara11
 
Research-Fundamentals-and-Topic-Development.pdf
ayesha butalia
 
Per Axbom: The spectacular lies of maps
Nexer Digital
 
Agentic AI in Healthcare Driving the Next Wave of Digital Transformation
danielle hunter
 
OFFOFFBOX™ – A New Era for African Film | Startup Presentation
ambaicciwalkerbrian
 
What-is-the-World-Wide-Web -- Introduction
tonifi9488
 
AI and Robotics for Human Well-being.pptx
JAYMIN SUTHAR
 

Using Regular Expressions and Staying Sane

  • 1. Regular Expressions How not to turn one problem into two. Carl Brown [email protected]
  • 2. “Common Wisdom” “Some people, when confronted with a problem, think ‘I know, I'll use regular expressions.’ Now they have two problems.” *See http://regex.info/blog/2006-09-15/247 for source.
  • 3. What is a ‘Regular Expression’? “...a concise and flexible means for ‘matching’ (specifying and recognizing) strings of text, such as particular characters, words, or patterns of characters” (So says Wikipedia) “... a way of extracting substrings from text in a ‘usefully fuzzy’ way” (So says me)
  • 4. ...so for example? Pull out the host from a URL string: http://([^/]*)/ find the date in a string ([0-9][0-9]*[-/][0-9][0-9]*[-/][0-9][0-9]*)
  • 5. But they’re a Pain to Use Aren’t they?
  • 6. Two Kinds of (OOish) Languages Some languages, Like perl or ruby, have Regex build into their strings, so they get used often. Most others, like Cocoa, Java, Python have Regular Expression Objects, that are complicated and a Pain in the Ass
  • 8. Cocoa (Apple) +[NSRegularExpression regularExpressionWithPattern:(NSString *) pattern options:(NSRegularExpressionOptions)options error:(NSError **) error] -[NSRegularExpression replaceMatchesInString:(NSMutableString *) string options:(NSMatchingOptions)options range:(NSRange)range withTemplate:(NSString *)template]
  • 9. Cocoa (Apple) +[NSRegularExpression regularExpressionWithPattern:(NSString *) pattern options:(NSRegularExpressionOptions)options error:(NSError **) error] -[NSRegularExpression replaceMatchesInString:(NSMutableString *) string options:(NSMatchingOptions)options range:(NSRange)range withTemplate:(NSString *)template] NSRegularExpressionOptions? NSMatchingOptions? Why do I need a Range? What’s a template string?
  • 10. Cocoa (Apple) +[NSRegularExpression regularExpressionWithPattern:(NSString *) pattern options:(NSRegularExpressionOptions)options error:(NSError **) error] -[NSRegularExpression replaceMatchesInString:(NSMutableString *) string options:(NSMatchingOptions)options range:(NSRange)range withTemplate:(NSString *)template] NSRegularExpressionOptions? NSMatchingOptions? Why do I need a Range? What’s a template string? Is it really worth it?
  • 11. Cocoa (sane) #import "NSString+PDRegex.h" [string stringByReplacingRegexPattern:@"pattern" withString:@"replacement" caseInsensitive:NO]; *See https://github.com/carlbrown/RegexOnNSString/
  • 12. Python (an aside) import re re.match(“pattern”,“a pattern”) #no match re.search(“pattern”,“a pattern”) #matches fine
  • 13. But Regex’s are impossible to maintain... Aren’t they?
  • 16. But what about? (?<!(=)|(="")|(='))(((http|ftp| https)://)|(www.))+[w]+(.[w]+) ([w-.@?^=%&amp;:/~+#]*[w-@? ^=%&amp;/~+#])?(?!.*/a>) *That* guy has two problems Well, Actually, he has n! problems where, n is the number of hyperlinks in the input string
  • 17. How to keep that from happening (my advice) Limit yourself to only the basic meta- characters. Favor clarity over brevity. Take more smaller bites. Beware of greedy matching
  • 18. The Basic Characters A Phrasebook
  • 20. PhraseBook pt 1 ^.* “the junk to the left of what I want” This breaks down as ^ (the beginning of the string) followed by .* any number of any character.
  • 21. PhraseBook pt 1 ^.* “the junk to the left of what I want” This breaks down as ^ (the beginning of the string) followed by .* any number of any character. .*$ “the junk to the right of what I want” This breaks down as any number of any character .* followed by $ (the end of the string)
  • 22. PhraseBook pt 2 [0–9][0–9]* “a number with at least one digit” The brackets ([ and ]) mean “any of the characters contained within the brackets”. So this means 1 character of 0–9 (so 0 1 2 3 4 5 6 7 8 or 9) followed by zero or more of the same character.
  • 23. PhraseBook pt 2 [0–9][0–9]* “a number with at least one digit” The brackets ([ and ]) mean “any of the characters contained within the brackets”. So this means 1 character of 0–9 (so 0 1 2 3 4 5 6 7 8 or 9) followed by zero or more of the same character. [^A-Za-z] “any character that’s not a letter” The ^ as the first character inside the brackets means “not” so instead of meaning “any letter” it means “anything not a letter”.
  • 24. PhraseBook pt 3 . “a literal period” (e.g. to match the dot in .com)
  • 25. PhraseBook pt 3 . “a literal period” (e.g. to match the dot in .com) * “a literal * ” (e.g. to match an asterisk)
  • 26. PhraseBook pt 3 . “a literal period” (e.g. to match the dot in .com) * “a literal * ” (e.g. to match an asterisk) ( ) or [ ] “literal parenthesis/brackets” (in Cocoa, at least)
  • 27. PhraseBook pt 3 . “a literal period” (e.g. to match the dot in .com) * “a literal * ” (e.g. to match an asterisk) ( ) or [ ] “literal parenthesis/brackets” (in Cocoa, at least) ( …stuff… ) “stuff I want to refer to later as $1” (in Cocoa, at least)
  • 29. PhraseBook pt 4 There is no... Part 4
  • 30. But what about? * Cheat Sheet from http://www.addedbytes.com/cheat-sheets/regular-expressions-cheat-sheet/
  • 32. There is no... Part 4 But what about?
  • 33. Clarity > Brevity (Really true of any language)
  • 34. Choose the clearest way: [A-Za-z_] instead of w [^A-Za-z_] instead of W
  • 36. Choose the consistent way: OSX:~$ grep '^root::*' /etc/passwd root:*:0:0:System Administrator:/var/root:/bin/sh OSX:~$ grep '^root:+' /etc/passwd OSX:~$
  • 37. Choose the consistent way: OSX:~$ grep '^root::*' /etc/passwd root:*:0:0:System Administrator:/var/root:/bin/sh OSX:~$ grep '^root:+' /etc/passwd OSX:~$ OSX:~$ grep '^root:.*' /etc/passwd root:*:0:0:System Administrator:/var/root:/bin/sh OSX:~$ grep '^root:.*?' /etc/passwd OSX:~$
  • 38. Except when you can’t ([^/][^]*)/ => 1 http:// (POSIX/sed) ([^/][^]*)/ => $1 http:// (perl/cocoa)
  • 39. Take Smaller Bites The less you do at a time, the safer each step is
  • 40. Which is clearer? NSString *domainName = [myHTMLString stringByReplacingRegexPattern: @"^.*href=[”’]http://(.*)/.*$" withString:@"$1" caseInsensitive:YES];
  • 41. Which is clearer? NSString *leftRemoved = [myHTMLString stringByReplacingRegexPattern: @"^.*href=[‘“]" withString:@"" caseInsensitive:YES]; NSString *myURL = [leftRemoved stringByReplacingRegexPattern: @"[“‘].*$" withString:@"" caseInsensitive:NO]; NSString *hostAndPath = [myURL stringByReplacingRegexPattern: @"^.*http://" withString:@"" caseInsensitive:YES]; NSString *domainName = [hostAndPath stringByReplacingRegexPattern: @"/.*$" withString:@"" caseInsensitive:NO]; Bonus: This one can be stepped through with the debugger :-)
  • 42. But isn’t that slower? Yes.
  • 43. But isn’t that slower? Yes. But it doesn’t matter how fast you get the wrong answer.
  • 44. Beware Greedy Matching Remember this? NSString *domainName = [myHTMLString stringByReplacingRegexPattern: @"^.*href=[”’]http://(.*)/.*$" withString:@"$1" caseInsensitive:YES];
  • 45. Beware Greedy Matching Remember this? NSString *domainName = [myHTMLString stringByReplacingRegexPattern: @"^.*href=[”’]http://(.*)/.*$" withString:@"$1" caseInsensitive:YES]; What does it do if given: <a href=“http://1.example.com/”>This is a link</ a> but <a href=“http://2.example.com/”>This is a link, too.</a>
  • 46. Beware Greedy Matching Remember this? NSString *domainName = [myHTMLString stringByReplacingRegexPattern: @"^.*href=[”’]http://(.*)/.*$" withString:@"$1" caseInsensitive:YES]; What does it do if given: <a href=“http://1.example.com/”>This is a link</ a> but <a href=“http://2.example.com/”>This is a link, too.</a>
  • 47. What you meant was: After ‘http://’ up to but not including the next ‘/’
  • 48. What you meant was: After ‘http://’ up to but not including the next ‘/’ Which is: http://([^/][^/]*)/
  • 49. Remember this? (?<!(=)|(="")|(='))(((http|ftp| https)://)|(www.))+[w]+(.[w]+) ([w-.@?^=%&amp;:/~+#]*[w-@? ^=%&amp;/~+#])?(?!.*/a>) Well, Actually, he has n! problems where, n is the number of hyperlinks in the input string
  • 50. So if you had <p>Today’s Links:</p> <UL> <LI><A HREF=”http://example.com/1”>Link 1</A></LI> <LI><A HREF=”http://example.com/2”>Link 2</A></LI> <LI><A HREF=”http://example.com/3”>Link 3</A></LI> <LI><A HREF=”http://example.com/4”>Link 4</A></LI> <LI><A HREF=”http://example.com/5”>Link 5</A></LI> <LI><A HREF=”http://example.com/6”>Link 6</A></LI> </UL>
  • 51. And tried to use: (?<!(=)|(="")|(='))(((http|ftp| https)://)|(www.))+[w]+(.[w]+) ([w-.@?^=%&amp;:/~+#]*[w-@? ^=%&amp;/~+#])?(?!.*/a>)
  • 52. It would have to: <p>Today’s Links:</p> <UL> <LI><A HREF=”http://example.com/1”>Link 1</A></LI> <LI><A HREF=”http://example.com/2”>Link 2</A></LI> <LI><A HREF=”http://example.com/3”>Link 3</A></LI> <LI><A HREF=”http://example.com/4”>Link 4</A></LI> <LI><A HREF=”http://example.com/5”>Link 5</A></LI> <LI><A HREF=”http://example.com/6”>Link 6</A></LI> </UL>
  • 53. And then: <p>Today’s Links:</p> <UL> <LI><A HREF=”http://example.com/1”>Link 1</A></LI> <LI><A HREF=”http://example.com/2”>Link 2</A></LI> <LI><A HREF=”http://example.com/3”>Link 3</A></LI> <LI><A HREF=”http://example.com/4”>Link 4</A></LI> <LI><A HREF=”http://example.com/5”>Link 5</A></LI> <LI><A HREF=”http://example.com/6”>Link 6</A></LI> </UL>
  • 54. And then: <p>Today’s Links:</p> <UL> <LI><A HREF=”http://example.com/1”>Link 1</A></LI> <LI><A HREF=”http://example.com/2”>Link 2</A></LI> <LI><A HREF=”http://example.com/3”>Link 3</A></LI> <LI><A HREF=”http://example.com/4”>Link 4</A></LI> <LI><A HREF=”http://example.com/5”>Link 5</A></LI> <LI><A HREF=”http://example.com/6”>Link 6</A></LI> </UL>
  • 55. And then: <p>Today’s Links:</p> <UL> <LI><A HREF=”http://example.com/1”>Link 1</A></LI> <LI><A HREF=”http://example.com/2”>Link 2</A></LI> <LI><A HREF=”http://example.com/3”>Link 3</A></LI> <LI><A HREF=”http://example.com/4”>Link 4</A></LI> <LI><A HREF=”http://example.com/5”>Link 5</A></LI> <LI><A HREF=”http://example.com/6”>Link 6</A></LI> </UL>
  • 56. And then: <p>Today’s Links:</p> <UL> <LI><A HREF=”http://example.com/1”>Link 1</A></LI> <LI><A HREF=”http://example.com/2”>Link 2</A></LI> <LI><A HREF=”http://example.com/3”>Link 3</A></LI> <LI><A HREF=”http://example.com/4”>Link 4</A></LI> <LI><A HREF=”http://example.com/5”>Link 5</A></LI> <LI><A HREF=”http://example.com/6”>Link 6</A></LI> </UL>
  • 57. And then: <p>Today’s Links:</p> <UL> <LI><A HREF=”http://example.com/1”>Link 1</A></LI> <LI><A HREF=”http://example.com/2”>Link 2</A></LI> <LI><A HREF=”http://example.com/3”>Link 3</A></LI> <LI><A HREF=”http://example.com/4”>Link 4</A></LI> <LI><A HREF=”http://example.com/5”>Link 5</A></LI> <LI><A HREF=”http://example.com/6”>Link 6</A></LI> </UL>
  • 58. And then: <p>Today’s Links:</p> <UL> <LI><A HREF=”http://example.com/1”>Link 1</A></LI> <LI><A HREF=”http://example.com/2”>Link 2</A></LI> <LI><A HREF=”http://example.com/3”>Link 3</A></LI> <LI><A HREF=”http://example.com/4”>Link 4</A></LI> <LI><A HREF=”http://example.com/5”>Link 5</A></LI> <LI><A HREF=”http://example.com/6”>Link 6</A></LI> </UL>
  • 59. And so on: <p>Today’s Links:</p> <UL> <LI><A HREF=”http://example.com/1”>Link 1</A></LI> <LI><A HREF=”http://example.com/2”>Link 2</A></LI> <LI><A HREF=”http://example.com/3”>Link 3</A></LI> <LI><A HREF=”http://example.com/4”>Link 4</A></LI> <LI><A HREF=”http://example.com/5”>Link 5</A></LI> <LI><A HREF=”http://example.com/6”>Link 6</A></LI> </UL>
  • 60. But what are they good for? Encoding/decoding metadata from image file names.
  • 61. But what are they good for? Encoding/decoding metadata from image file names. Renaming files on the command line (@2x?)
  • 62. But what are they good for? Encoding/decoding metadata from image file names. Renaming files on the command line (@2x?) Grabbing the user’s first name from a Full Name string (careful of Locales*) *See http://www.kalzumeus.com/2010/06/17/falsehoods-programmers-believe-about-names/
  • 63. But what are they good for? Encoding/decoding metadata from image file names. Renaming files on the command line (@2x?) Grabbing the user’s first name from a Full Name string (careful of Locales) Stripping crap I don’t want out of user input (trailing spaces, anyone?)
  • 64. But what are they good for? Encoding/decoding metadata from image file names. Renaming files on the command line (@2x?) Grabbing the user’s first name from a Full Name string (careful of Locales) Stripping crap I don’t want out of user input (trailing spaces, anyone?) //.*[.* *release *] *;
  • 65. Questions? [email protected] @CarlAllenBrown www.escortmissions.com (Blog) www.PDAgent.com (Company) https://github.com/carlbrown http://www.slideshare.net/carlbrown

Editor's Notes

  • #2: This is not a talk about every possible thing you can do with regular expressions. In fact, it&amp;#x2019;s exactly the opposite. This is about how to do a useful thing and do it without going crazy.\n
  • #3: \n
  • #4: So before I get too far, how many of you know what a regular expression is?\nHow many have used them before? How many feel comfortable with them?\n
  • #5: So here&amp;#x2019;s a quick example, just so those of you who haven&amp;#x2019;t touched them have an idea what I&amp;#x2019;m talking about between now and when we dig into examples later on.\n
  • #6: Well, it depends...you see...\n
  • #7: I&amp;#x2019;m saying OOish because I have issues with perl&amp;#x2019;s OO, but that&amp;#x2019;s another talk.\nI went from Basic to Pascal to C to perl (to C++ to Lisp to Java to Ruby to Objective-C). I started learning perl in 1989 or so, and it was exactly what I needed at the time - it was a language that was really good at exactly what C made very painful: String handling. I have better alternatives than perl now, but it taught me regex&amp;#x2019;s.\n
  • #8: This is an example of a usage in a language where a Regex is a first-class citizen.\n
  • #9: This is a WTF. And it brings to mind a bunch of questions...\n
  • #10: and the most often asked question in Cocoa Regex...\n
  • #11: \n
  • #12: This is better (but you have to do the #import).\n
  • #13: re.match in python implicitly anchors you to the beginning of a string. This is hideous.\n
  • #14: Well, I&amp;#x2019;d say no. I use them all the time.\n
  • #15: This is a actual regex I found in a program I was once asked to find the performance problem in.\n
  • #16: This is unmaintainable, and worse...\n
  • #17: We&amp;#x2019;ll come back to this one later\n
  • #18: \n
  • #19: \n
  • #20: Let me do a quick phrasebook first.\n
  • #21: Let me do a quick phrasebook first.\n
  • #22: You can (and should) put whatever characters you are looking for in square brackets. \nIf you omit the first [0&amp;#x2013;9] you might match nothing.\n\nLikewise, in the second part [^0-9] means &amp;#x201C;anything that isn&amp;#x2019;t a number&amp;#x201D;.\n
  • #23: You can (and should) put whatever characters you are looking for in square brackets. \nIf you omit the first [0&amp;#x2013;9] you might match nothing.\n\nLikewise, in the second part [^0-9] means &amp;#x201C;anything that isn&amp;#x2019;t a number&amp;#x201D;.\n
  • #24: Anything else that you see that&amp;#x2019;s special (like &amp;#x2018;^&amp;#x2019; or &amp;#x2018;\\\\&amp;#x2019;) gets matched with a &amp;#x2018;\\&amp;#x2019; in front of it, too.\n
  • #25: Anything else that you see that&amp;#x2019;s special (like &amp;#x2018;^&amp;#x2019; or &amp;#x2018;\\\\&amp;#x2019;) gets matched with a &amp;#x2018;\\&amp;#x2019; in front of it, too.\n
  • #26: Anything else that you see that&amp;#x2019;s special (like &amp;#x2018;^&amp;#x2019; or &amp;#x2018;\\\\&amp;#x2019;) gets matched with a &amp;#x2018;\\&amp;#x2019; in front of it, too.\n
  • #27: Anything else that you see that&amp;#x2019;s special (like &amp;#x2018;^&amp;#x2019; or &amp;#x2018;\\\\&amp;#x2019;) gets matched with a &amp;#x2018;\\&amp;#x2019; in front of it, too.\n
  • #28: I mean it, I&amp;#x2019;m done.\n
  • #29: But there&amp;#x2019;s all these other characters...\n
  • #30: \n
  • #31: \n
  • #32: can you tell the difference between &amp;#x2018;w&amp;#x2019; and &amp;#x2018;W&amp;#x2019; every time, without looking?\n\nCan you promise you&amp;#x2019;ll never get confused about whether &amp;#x2018;w&amp;#x2019; means &amp;#x2018;word&amp;#x2019; or &amp;#x2018;whitespace&amp;#x2019;?\n
  • #33: Maximize the utility of your investment \nThere is a &amp;#x2018;+&amp;#x2019; operator that *Sometimes* means &amp;#x201C;one or more&amp;#x201D; like ::*. + works in Cocoa, but not in grep. If you stick to the ones that are the same everywhere, you will get more use out of it and be less confused\nSame with .*? to handle greedy matching\n
  • #34: Maximize the utility of your investment \nThere is a &amp;#x2018;+&amp;#x2019; operator that *Sometimes* means &amp;#x201C;one or more&amp;#x201D; like ::*. + works in Cocoa, but not in grep. If you stick to the ones that are the same everywhere, you will get more use out of it and be less confused\nSame with .*? to handle greedy matching\n
  • #35: \n
  • #36: \n
  • #37: \n
  • #38: Note - regex&amp;#x2019;s don&amp;#x2019;t parse HTML/XML &amp;#x201C;correctly&amp;#x201D; so be careful\n
  • #39: \n
  • #40: \n
  • #41: You get the HTML between the links, don&amp;#x2019;t you?\n
  • #42: You get the HTML between the links, don&amp;#x2019;t you?\n
  • #43: You get the HTML between the links, don&amp;#x2019;t you?\n
  • #44: Although you can use .*? at least on some platforms\n
  • #45: Although you can use .*? at least on some platforms\n
  • #46: This code was used in production on a project I was asked to consult on in a Content Management System (of sorts) to detect links that should be clickable on a web page, but weren&amp;#x2019;t, and make them clickable.\n
  • #47: And the customer fed that Content Management System a big list of links\n
  • #48: note it&amp;#x2019;s looking at http followed by :// followed by stuff, then anything, then /A.\n
  • #49: The regex library grabs the longest string it can, first, to see if that&amp;#x2019;s a match (because it&amp;#x2019;s supposed to be greedy)\n
  • #50: then, when that doesn&amp;#x2019;t match, the next longest string\n
  • #51: and so on\n
  • #52: \n
  • #53: \n
  • #54: and then, when it&amp;#x2019;s exhausted the shortest string for that beginning match,\n
  • #55: It does it again for the next beginning match it finds\n
  • #56: and so on there.\n\nBAD IDEA.\n
  • #57: When I&amp;#x2019;m doing Core Data on the iPhone, the images go in a directory (NEVER in the DB!!), and I put info I might need (like when I should refresh it) in the image name, so I can do maintenance without having to ask the DB.\n
  • #58: \n
  • #59: \n
  • #60: And coming up next, my current favorite to use in XCode&amp;#x2019;s search project box...\n
  • #61: Which, of course, means the price just went up 25%.\n\nOnce you get comfortable with them, you start to see chances to use them everywhere.\n
  • #62: \n