Perl Regular Expression Quick Reference Card
Syntax
\d
[0-9]
A digit
\D
[ˆ0-9]
Revision 0.1 (draft) for Perl 5.8.5
A nondigit
\
Escapes the character immediately following it
\w
[a-zA-Z0-9 ]
Iain Truskett (formatting by Andrew Ford)
TM
A word character
.
Matches any single character except a newline (un-
\W
[ˆa-zA-Z0-9 ]
A non-word character
less /s is used)
\s
[ \t\n\r\f]
A whitespace character
ˆ
Matches at the beginning of the string (or line, if /m
\S
[ˆ \t\n\r\f]
A non-whitespace character
is used)
$
\C
Matches at the end of the string (or line, if /m is used)
Match a byte (with Unicode, ‘.’ matches a charac-
*
Matches the preceding element 0 or more times
ter)
+
Matches the preceding element 1 or more times
\pP
Match P-named (Unicode) property
?
Matches the preceding element 0 or 1 times
\p{...}
Match Unicode property with long name
{...}
Specifies a range of occurrences for the element pre-
\PP
Match non-P
ceding it
\P{...}
Match lack of Unicode property with long name
[...]
Matches any one of the characters contained within
\X
Match extended unicode sequence
This is a quick reference to Perl’s regular expressions. For full
the brackets
information see the perlre and perlop manual pages.
(...)
Groups subexpressions for capturing to $1, $2...
POSIX character classes and their Unicode and Perl equivalents:
(?:...)
Groups subexpressions without capturing (cluster)
|
Matches either the subexpression preceding or fol-
alnum
IsAlnum
lowing it
alpha
IsAlpha
\1, \2 ...
The text from the Nth group
ascii
IsASCII
Operators
blank
IsSpace
Escape sequences
cntrl
IsCntrl
digit
IsDigit
These work as in normal strings.
graph
IsGraph
=˜ determines to which variable the regex is applied. In its ab-
\a
Alarm (beep)
lower
IsLower
sence, $_ is used.
\e
print
IsPrint
Escape
$var =˜ /foo/;
\f
punct
IsPunct
Formfeed
!˜ determines to which variable the regex is applied, and negates
\n
space
IsSpace
Newline
the result of the match; it returns false if the match succeeds,
\r
IsSpacePerl
Carriage return
and true if it fails.
upper
IsUpper
\t
Tab
$var !˜ /foo/;
word
IsWord
\038
Any octal ASCII value
m/pattern/igmsoxc
xdigit
IsXDigit
\x7f
Any hexadecimal ASCII value
searches a string for a pattern match, applying the given op-
\x{263a}
A wide hexadecimal value
tions.
i
case-insensitive
\cx
Control-x
Within a character class:
g
global – all occurrences
\N{name}
A named character
m
multiline mode – ˆ and $ match internal lines
\l
Lowercase next character
s
match as a single line – . matches \n
POSIX
traditional
Unicode
\u
Titlecase next character
o
compile pattern once
[:digit:]
\d
\p{IsDigit}
\L
Lowercase until \E
x
extended legibility – free whitespace and com-
[:ˆdigit:]
\D
\P{IsDigit}
\U
Uppercase until \E
ments
\Q
Disable pattern metacharacters until \E
c
don’t reset pos on failed matches when using /g
\E
End case modification
If pattern is an empty string, the last successfully matched
This one works differently from normal strings:
regex is used. Delimiters other than ‘/’ may be used for both
Anchors
\b
An assertion, not backspace, except in a character
this operator and the following ones.
class
qr/pattern/imsox
All are zero-width assertions.
lets you store a regex in a variable, or pass one around. Mod-
ifiers as for m// and are stored within the regex.
Character classes
ˆ
Match string start (or line, if /m is used)
s/pattern/replacement/igmsoxe
$
Match string end (or line, if /m is used) or before
substitutes matches of pattern with replacement. Modifiers
[amy]
Match ‘a’, ‘m’ or ‘y’
newline
as for m// with one addition:
[f-j]
Dash specifies range
\b
Match word boundary (between \w and \W)
e
evaluate replacement as an expression
[f-j-]
Dash escaped or at start or end means ‘dash’
\B
Match except at word boundary (between \w and \w
‘e’ may be specified multiple times. replacement is inter-
[ˆf-j]
Caret indicates “match any character except these”
or \W and \W)
preted as a double quoted string unless a single-quote (’) is
The following sequences work within or without a character class.
\A
Match string start (regardless of /m)
the delimiter.
The first six are locale aware, all are Unicode aware. The default
\Z
?pattern?
Match string end (before optional newline)
character class equivalent are given. See the perllocale and perlu-
\z
is like m/pattern/ but matches only once. No alternate de-
Match absolute string end
nicode man pages for details.
limiters can be used. Must be reset with reset.
\G
Match where previous m//g left off
1
2
3