Python 2.7 Regular Expressions

ADVERTISEMENT

Extensions. These do not cause grouping, except for
Python 2.7 Regular
(?P<name>...):
Expressions
(?iLmsux)
Matches empty string, sets re.X flags
(?:...)
Non-capturing version of regular parentheses
(?P<name>...)
Creates a named capturing group.
(?P=<name>)
Matches whatever matched previously named group
Special characters:
(?#...)
A comment; ignored.
(?=...)
Lookahead assertion: Matches without consuming
(?!...)
Negative lookahead assertion
\
escapes special characters.
(?<=...)
Lookbehind assertion: Matches if preceded
.
matches any character
(?<!...)
Negative lookbehind assertion
^
matches start of the string (or line if MULTILINE)
(?(id)yes|no)
Match 'yes' if group 'id' matched, else 'no'
$
matches end of the string (or line if MULTILINE)
[5b-d]
matches any chars '5', 'b', 'c' or 'd'
[^a-c6] matches any char except 'a', 'b', 'c' or '6'
Flags for re.compile(), etc. Combine with '|':
R|S
matches either regex R or regex S.
()
Creates a capture group, and indicates precedence.
re.I == re.IGNORECASE
Ignore case
re.L == re.LOCALE
Make \w, \b, and \s locale dependent
Within [], no special chars do anything special, hence
re.M == re.MULTILINE
Multiline
re.S == re.DOTALL
Dot matches all (including newline)
they don't need escaping, except for ']' and '-',
re.U == re.UNICODE
Make \w, \b, \d, and \s unicode dependent
re.X == re.VERBOSE
Verbose (unescaped whitespace in pattern
which only need escaping if they are not the 1st char.
is ignored, and '#' marks comment lines)
e.g. '[]]'
matches ']'. '^'
also has special
meaning, it negates the group if it's the first character in
Module level functions:
the [], and needs to be escaped to match it literally.
compile(pattern[, flags]) -> RegexObject
Quantifiers:
match(pattern, string[, flags]) -> MatchObject
search(pattner, string[, flags]) -> MatchObject
findall(pattern, string[, flags]) -> list of strings
*
0 or more
(append ? for non-greedy)
finditer(pattern, string[, flags]) -> iter of MatchObjects
+
1 or more
"
split(pattern, string[, maxsplit, flags]) -> list of strings
?
0 or 1
"
sub(pattern, repl, string[, count, flags]) -> string
{m}
exactly 'm'
subn(pattern, repl, string[, count, flags]) -> (string, int)
{m,n}
from m to n. 'm' defaults to 0, 'n' to infinity
escape(string) -> string
{m,n}?
from m to n, as few as possible
purge() # the re cache
Special sequences:
RegexObjects (returned from compile()):
\A
Start of string
.match(string[, pos, endpos]) -> MatchObject
\b
Matches empty string at word boundary (between \w and \W)
.search(string[, pos, endpos]) -> MatchObject
\B
Matches empty string not at word boundary
.findall(string[, pos, endpos]) -> list of strings
\d
Digit
.finditer(string[, pos, endpos]) -> iter of MatchObjects
\D
Non-digit
.split(string[, maxsplit]) -> list of strings
\s
Whitespace: [ \t\n\r\f\v], more if LOCALE or UNICODE
.sub(repl, string[, count]) -> string
\S
Non-whitespace
.subn(repl, string[, count]) -> (string, int)
\w
Alphanumeric: [0-9a-zA-Z_], or is LOCALE dependant
.flags
# int passed to compile()
\W
Non-alphanumeric
\Z
End of string
.groups
# int number of capturing groups
.groupindex
# {} maps group names to ints
\g<id>
Match previous named or numbered group,
.pattern
# string passed to compile()
e.g. \g<0> or \g<name>
MatchObjects (returned from match() and search()):
Special character escapes are much like those already
escaped in Python string literals. Hence regex '\n' is
.expand(template) -> string, backslash and group expansion
.group([group1...]) -> string or tuple of strings, 1 per arg
same as regex '\\n':
.groups([default]) -> (,) of all groups, non-matching=default
.groupdict([default]) -> {} of named groups, non-matching=default
.start([group]) -> int, start/end of substring matched by group
\a
ASCII Bell (BEL)
.end([group])
(group defaults to 0, the whole match)
\f
ASCII Formfeed
.span([group]) -> tuple (match.start(group), match.end(group))
.pos # value passed to search() or match()
\n
ASCII Linefeed
.endpos # "
\r
ASCII Carraige return
.lastindex # int index of last matched capturing group
\t
ASCII Tab
.lastgroup # string name of last matched capturing group
.re # regex passed to search() or match()
\v
ASCII Vertical tab
.string # string passed to search() or match()
\\
A single backslash
Gleaned
from
the
python
2.7
're'
docs.
\xHH
Two digit hex character
\OOO
Three digit octal char
(or use a preceding zero, e.g. \0, \09)
Version:
v0.3.1
\DD
Decimal number 1 to 99, matches previous
numbered group
Contact:

ADVERTISEMENT

00 votes

Related Articles

Related forms

Related Categories

Parent category: Education
Go