Basic Regular Expressions In R Cheat Sheet printable pdf download

pattern

regmatches(string, regexpr(pattern, string))

Cheat Sheet

extract first match

[1] "tam" "tim"

string

regmatches(string, gregexpr(pattern, string))

extracts all matches, outputs a list

[[1]] "tam" [[2]] character(0) [[3]] "tim" "tom"

stringr::str_extract(string, pattern)

extract first match

[1] "tam" NA

"tim"

[[:digit:]]

\\d Digits;

stringr::str_extract_all(string, pattern)

[0-9]

\\D

Non-digits;

extract all matches, outputs a list

[^0-9]

> string <- c("Hiphopopotamus", "Rhymenoceros",

"time for bottomless

lyrics")

[[:lower:]]

Lower-case letters;

[a-z]

stringr::str_extract_all(string, pattern, simplify = TRUE)

> pattern <- "t.m"

[[:upper:]]

Upper-case letters;

[A-Z]

extract all matches, outputs a matrix

[[:alpha:]]

Alphabetic characters;

[A-z]

stringr::str_match(string, pattern)

[[:alnum:]]

Alphanumeric characters

[A-z0-9]

extract first match + individual character groups

\\w

Word characters;

[A-z0-9_]

grep(pattern, string)

regexpr(pattern, string)

stringr::str_match_all(string, pattern)

\\W

Non-word characters

[1] 1 3

find starting position and length of first match

extract all matches + individual character groups

[[:xdigit:]]

\\x Hexadec. digits;

[0-9A-Fa-f]

[[:blank:]]

Space and tab

grep(pattern, string, value = TRUE)

gregexpr(pattern, string)

[1] "Hiphopopotamus"

find starting position and length of all matches

[[:space:]]

\\s Space, tab, vertical tab, newline,

[2] "time for bottomless lyrics“

form feed, carriage return

stringr::str_locate(string, pattern)

sub(pattern, replacement, string)

\\S

Not space;

[^[:space:]]

grepl(pattern, string)

find starting and end position of first match

replace first match

[[:punct:]]

Punctuation characters;

[1]

TRUE FALSE

TRUE

stringr::str_locate_all(string, pattern)

gsub(pattern, replacement, string)

!"#$%&’()*+,-./:;<=>?@[]^_`{|}~

find starting and end position of all matches

stringr::str_detect(string, pattern)

replace all matches

Graphical char.;

[1]

TRUE FALSE

TRUE

[[:graph:]]

[[:alnum:][:punct:]]

stringr::str_replace(string, pattern, replacement)

Printable characters

;

replace first match

[[:print:]]

[[:alnum:][:punct:]\\s]

stringr::str_replace_all(string, pattern, replacement)

[[:cntrl:]]

\\c Control characters;

\n, \r etc.

strsplit(string, pattern) or stringr::str_split(string, pattern)

replace all matches

New line

Any character except \n

Matches at least 0 times

Start of the string

Carriage return

Or, e.g. (a|b)

Matches at least 1 time

End of the string

Tab

List permitted characters, e.g. [abc]

[…]

Matches at most 1 time; optional string

\\b

Empty string at either edge of a word

Vertical tab

[a-z] Specify character ranges

{n}

Matches exactly n times

\\B

NOT the edge of a word

Form feed

[^…] List excluded characters

{n,}

Matches at least n times

\\<

Beginning of a word

(…)

Grouping, enables back referencing using

{,n}

Matches at most n times

\\>

End of a word

\\N where N is an integer

{n,m} Matches between n and m times

(?=)

Lookahead

(requires PERL = TRUE),

e.g. (?=yx): position followed by 'xy'

Metacharacters (. * + etc.) can be used as

By default the asterisk * is greedy, i.e. it always

By default R uses POSIX extended regular

(?!)

Negative lookahead

literal characters by escaping them. Characters

matches the longest possible string. It can be

(PERL = TRUE);

expressions. You can switch to PCRE regular

position NOT followed by pattern

can be escaped using \\ or by enclosing them

expressions using PERL = TRUE for base or by

used in lazy mode by adding ?, i.e. *?.

(?<=)

wrapping patterns with perl() for stringr.

in \\Q...\\E.

Lookbehind

(PERL = TRUE), e.g.

Greedy mode can be turned off using (?U). This

(?<=yx): position following 'xy'

All functions can be used with literal searches

switches the syntax, so that (?U)a* is lazy and

Negative lookbehind

(PERL = TRUE)

;

(?U)a*? is greedy.

using fixed = TRUE for base or by wrapping

(?<!)

position NOT following pattern

Regular expressions can be made case insensitive

patterns with fixed() for stringr.

)

then

If-then-condition

(PERL = TRUE); use

using (?i). In backreferences, the strings can be

lookaheads, optional char. etc in if-clause

All base functions can be made case insensitive

converted to lower or upper case using \\L or \\U

)

then

else

If-then-else-condition

(PERL = TRUE)

Regular expressions can conveniently be

by specifying ignore.cases = TRUE.

(e.g. \\L\\1). This requires PERL = TRUE.

*see, e.g.

created using rex::rex().

Ian Kopacka •

CC BY

ian.kopacka@ages.at

Updated: 09/16

Basic Regular Expressions In R Cheat Sheet

Related Articles

Related forms

Related Categories