Python 2.7 Regular Expressions

ADVERTISEMENT

Extensions. Do not cause grouping, except 'P<name>':
Python 2.7 Regular
(?iLmsux)
Match empty string, sets re.X flags
Expressions
(?:...)
Non-capturing version of regular parens
(?P<name>...) Create a named capturing group.
(?P=name)
Match whatever matched prev named group
(?#...)
A comment; ignored.
Non-special chars match themselves. Exceptions are
(?=...)
Lookahead assertion, match without consuming
special characters:
(?!...)
Negative lookahead assertion
(?<=...)
Lookbehind assertion, match if preceded
(?<!...)
Negative lookbehind assertion
\
Escape special char or start a sequence.
(?(id)y|n)
Match 'y' if group 'id' matched, else 'n'
.
Match any char except newline, see re.DOTALL
^
Match start of the string, see re.MULTILINE
Flags for re.compile(), etc. Combine with '|':
$
Match end of the string, see re.MULTILINE
[]
Enclose a set of matchable chars
R|S
Match either regex R or regex S.
re.I == re.IGNORECASE
Ignore case
()
Create capture group, & indicate precedence
re.L == re.LOCALE
Make \w, \b, and \s locale dependent
re.M == re.MULTILINE
Multiline
re.S == re.DOTALL
Dot matches all (including newline)
re.U == re.UNICODE
Make \w, \b, \d, and \s unicode dependent
After '[', enclose a set, the only special chars are:
re.X == re.VERBOSE
Verbose (unescaped whitespace in pattern
is ignored, and '#' marks comment lines)
]
End the set, if not the 1st char
-
A range, eg. a-c matches a, b or c
Module level functions:
^
Negate the set only if it is the 1st char
compile(pattern[, flags]) -> RegexObject
match(pattern, string[, flags]) -> MatchObject
Quantifiers (append '?' for non-greedy):
search(pattner, string[, flags]) -> MatchObject
findall(pattern, string[, flags]) -> list of strings
finditer(pattern, string[, flags]) -> iter of MatchObjects
{m}
Exactly m repetitions
split(pattern, string[, maxsplit, flags]) -> list of strings
{m,n}
From m (default 0) to n (default infinity)
sub(pattern, repl, string[, count, flags]) -> string
*
0 or more. Same as {,}
subn(pattern, repl, string[, count, flags]) -> (string, int)
escape(string) -> string
+
1 or more. Same as {1,}
purge() # the re cache
?
0 or 1. Same as {,1}
RegexObjects (returned from compile()):
Special sequences:
.match(string[, pos, endpos]) -> MatchObject
\A
Start of string
.search(string[, pos, endpos]) -> MatchObject
\b
Match empty string at word (\w+) boundary
.findall(string[, pos, endpos]) -> list of strings
.finditer(string[, pos, endpos]) -> iter of MatchObjects
\B
Match empty string not at word boundary
.split(string[, maxsplit]) -> list of strings
\d
Digit
.sub(repl, string[, count]) -> string
\D
Non-digit
.subn(repl, string[, count]) -> (string, int)
\s
Whitespace [ \t\n\r\f\v], see LOCALE,UNICODE
.flags
# int, Passed to compile()
\S
Non-whitespace
.groups
# int, Number of capturing groups
\w
Alphanumeric: [0-9a-zA-Z_], see LOCALE
.groupindex # {}, Maps group names to ints
.pattern
# string, Passed to compile()
\W
Non-alphanumeric
\Z
End of string
\g<id>
Match prev named or numbered group,
MatchObjects (returned from match() and search()):
'<' & '>' are literal, e.g. \g<0>
or \g<name> (not \g0 or \gname)
.expand(template) -> string, Backslash & group expansion
.group([group1...]) -> string or tuple of strings, 1 per arg
.groups([default]) -> tuple of all groups, non-matching=default
Special character escapes are much like those already
.groupdict([default]) -> {}, Named groups, non-matching=default
.start([group]) -> int, Start/end of substring match by group
escaped in Python string literals. Hence regex '\n' is
.end([group]) -> int, Group defaults to 0, the whole match
same as regex '\\n':
.span([group]) -> tuple (match.start(group), match.end(group))
.pos
int, Passed to search() or match()
.endpos
int, "
\a
ASCII Bell (BEL)
.lastindex int, Index of last matched capturing group
.lastgroup string, Name of last matched capturing group
\f
ASCII Formfeed
.re
regex, As passed to search() or match()
\n
ASCII Linefeed
.string
string, "
\r
ASCII Carriage return
\t
ASCII Tab
Gleaned
from
the
python
2.7
're'
docs.
\v
ASCII Vertical tab
\\
A single backslash
\xHH
Two digit hexadecimal character goes here
\OOO
Three digit octal char (or just use an
Version: v0.3.3
initial zero, e.g. \0, \09)
\DD
Decimal number 1 to 99, match
previous numbered group

ADVERTISEMENT

00 votes

Related Articles

Related forms

Related Categories

Parent category: Education
Go