Python 2.7 Regular Expressions printable pdf download

Extensions. These do not cause grouping, except for

Python 2.7 Regular

(?P<name>...):

Expressions

(?iLmsux)

Matches empty string, sets re.X flags

(?:...)

Non-capturing version of regular parentheses

(?P<name>...)

Creates a named capturing group.

(?P=<name>)

Matches whatever matched previously named group

Special characters:

(?#...)

A comment; ignored.

(?=...)

Lookahead assertion: Matches without consuming

(?!...)

Negative lookahead assertion

escapes special characters.

(?<=...)

Lookbehind assertion: Matches if preceded

matches any character

(?<!...)

Negative lookbehind assertion

matches start of the string (or line if MULTILINE)

(?(id)yes|no)

Match 'yes' if group 'id' matched, else 'no'

matches end of the string (or line if MULTILINE)

[5b-d]

matches any chars '5', 'b', 'c' or 'd'

[^a-c6] matches any char except 'a', 'b', 'c' or '6'

Flags for re.compile(), etc. Combine with '|':

R|S

matches either regex R or regex S.

()

Creates a capture group, and indicates precedence.

re.I == re.IGNORECASE

Ignore case

re.L == re.LOCALE

Make \w, \b, and \s locale dependent

Within [], no special chars do anything special, hence

re.M == re.MULTILINE

Multiline

re.S == re.DOTALL

Dot matches all (including newline)

they don't need escaping, except for ']' and '-',

re.U == re.UNICODE

Make \w, \b, \d, and \s unicode dependent

re.X == re.VERBOSE

Verbose (unescaped whitespace in pattern

which only need escaping if they are not the 1st char.

is ignored, and '#' marks comment lines)

e.g. '[]]'

matches ']'. '^'

also has special

meaning, it negates the group if it's the first character in

Module level functions:

the [], and needs to be escaped to match it literally.

compile(pattern[, flags]) -> RegexObject

Quantifiers:

match(pattern, string[, flags]) -> MatchObject

search(pattner, string[, flags]) -> MatchObject

findall(pattern, string[, flags]) -> list of strings

0 or more

(append ? for non-greedy)

finditer(pattern, string[, flags]) -> iter of MatchObjects

1 or more

split(pattern, string[, maxsplit, flags]) -> list of strings

0 or 1

sub(pattern, repl, string[, count, flags]) -> string

{m}

exactly 'm'

subn(pattern, repl, string[, count, flags]) -> (string, int)

{m,n}

from m to n. 'm' defaults to 0, 'n' to infinity

escape(string) -> string

{m,n}?

from m to n, as few as possible

purge() # the re cache

Special sequences:

RegexObjects (returned from compile()):

Start of string

.match(string[, pos, endpos]) -> MatchObject

Matches empty string at word boundary (between \w and \W)

.search(string[, pos, endpos]) -> MatchObject

Matches empty string not at word boundary

.findall(string[, pos, endpos]) -> list of strings

Digit

.finditer(string[, pos, endpos]) -> iter of MatchObjects

Non-digit

.split(string[, maxsplit]) -> list of strings

Whitespace: [ \t\n\r\f\v], more if LOCALE or UNICODE

.sub(repl, string[, count]) -> string

Non-whitespace

.subn(repl, string[, count]) -> (string, int)

Alphanumeric: [0-9a-zA-Z_], or is LOCALE dependant

.flags

# int passed to compile()

Non-alphanumeric

End of string

.groups

# int number of capturing groups

.groupindex

# {} maps group names to ints

\g<id>

Match previous named or numbered group,

.pattern

# string passed to compile()

e.g. \g<0> or \g<name>

MatchObjects (returned from match() and search()):

Special character escapes are much like those already

escaped in Python string literals. Hence regex '\n' is

.expand(template) -> string, backslash and group expansion

.group([group1...]) -> string or tuple of strings, 1 per arg

same as regex '\\n':

.groups([default]) -> (,) of all groups, non-matching=default

.groupdict([default]) -> {} of named groups, non-matching=default

.start([group]) -> int, start/end of substring matched by group

ASCII Bell (BEL)

.end([group])

(group defaults to 0, the whole match)

ASCII Formfeed

.span([group]) -> tuple (match.start(group), match.end(group))

.pos # value passed to search() or match()

ASCII Linefeed

.endpos # "

ASCII Carraige return

.lastindex # int index of last matched capturing group

ASCII Tab

.lastgroup # string name of last matched capturing group

.re # regex passed to search() or match()

ASCII Vertical tab

.string # string passed to search() or match()

A single backslash

Gleaned

from

the

python

2.7

're'

docs.

\xHH

Two digit hex character

\OOO

Three digit octal char

(or use a preceding zero, e.g. \0, \09)

Version:

v0.3.1

\DD

Decimal number 1 to 99, matches previous

numbered group

Contact:

Python 2.7 Regular Expressions

Related Articles

Related forms

Related Categories