Basic Regular Expressions In R Cheat Sheet

ADVERTISEMENT

pattern
regmatches(string, regexpr(pattern, string))
Cheat Sheet
extract first match
[1] "tam" "tim"
string
regmatches(string, gregexpr(pattern, string))
extracts all matches, outputs a list
[[1]] "tam" [[2]] character(0) [[3]] "tim" "tom"
stringr::str_extract(string, pattern)
extract first match
[1] "tam" NA
"tim"
[[:digit:]]
\\d Digits;
stringr::str_extract_all(string, pattern)
[0-9]
or
\\D
Non-digits;
extract all matches, outputs a list
[^0-9]
> string <- c("Hiphopopotamus", "Rhymenoceros",
"time for bottomless
lyrics")
[[:lower:]]
Lower-case letters;
[a-z]
stringr::str_extract_all(string, pattern, simplify = TRUE)
> pattern <- "t.m"
[[:upper:]]
Upper-case letters;
[A-Z]
extract all matches, outputs a matrix
[[:alpha:]]
Alphabetic characters;
[A-z]
stringr::str_match(string, pattern)
[[:alnum:]]
Alphanumeric characters
[A-z0-9]
extract first match + individual character groups
\\w
Word characters;
[A-z0-9_]
grep(pattern, string)
regexpr(pattern, string)
stringr::str_match_all(string, pattern)
\\W
Non-word characters
[1] 1 3
find starting position and length of first match
extract all matches + individual character groups
[[:xdigit:]]
\\x Hexadec. digits;
or
[0-9A-Fa-f]
[[:blank:]]
Space and tab
grep(pattern, string, value = TRUE)
gregexpr(pattern, string)
[1] "Hiphopopotamus"
find starting position and length of all matches
[[:space:]]
\\s Space, tab, vertical tab, newline,
or
[2] "time for bottomless lyrics“
form feed, carriage return
stringr::str_locate(string, pattern)
sub(pattern, replacement, string)
\\S
Not space;
[^[:space:]]
grepl(pattern, string)
find starting and end position of first match
replace first match
[[:punct:]]
Punctuation characters;
[1]
TRUE FALSE
TRUE
stringr::str_locate_all(string, pattern)
gsub(pattern, replacement, string)
!"#$%&’()*+,-./:;<=>?@[]^_`{|}~
find starting and end position of all matches
stringr::str_detect(string, pattern)
replace all matches
Graphical char.;
[1]
TRUE FALSE
TRUE
[[:graph:]]
[[:alnum:][:punct:]]
stringr::str_replace(string, pattern, replacement)
Printable characters
;
replace first match
[[:print:]]
[[:alnum:][:punct:]\\s]
stringr::str_replace_all(string, pattern, replacement)
[[:cntrl:]]
\\c Control characters;
or
\n, \r etc.
strsplit(string, pattern) or stringr::str_split(string, pattern)
replace all matches
\n
New line
Any character except \n
.
*
Matches at least 0 times
^
Start of the string
\r
Carriage return
|
Or, e.g. (a|b)
+
Matches at least 1 time
$
End of the string
\t
Tab
List permitted characters, e.g. [abc]
[…]
?
Matches at most 1 time; optional string
\\b
Empty string at either edge of a word
\v
Vertical tab
[a-z] Specify character ranges
{n}
Matches exactly n times
\\B
NOT the edge of a word
\f
Form feed
[^…] List excluded characters
{n,}
Matches at least n times
\\<
Beginning of a word
(…)
Grouping, enables back referencing using
{,n}
Matches at most n times
\\>
End of a word
\\N where N is an integer
{n,m} Matches between n and m times
(?=)
Lookahead
(requires PERL = TRUE),
e.g. (?=yx): position followed by 'xy'
Metacharacters (. * + etc.) can be used as
By default the asterisk * is greedy, i.e. it always
By default R uses POSIX extended regular
(?!)
Negative lookahead
literal characters by escaping them. Characters
matches the longest possible string. It can be
(PERL = TRUE);
expressions. You can switch to PCRE regular
position NOT followed by pattern
can be escaped using \\ or by enclosing them
expressions using PERL = TRUE for base or by
used in lazy mode by adding ?, i.e. *?.
(?<=)
wrapping patterns with perl() for stringr.
in \\Q...\\E.
Lookbehind
(PERL = TRUE), e.g.
Greedy mode can be turned off using (?U). This
(?<=yx): position following 'xy'
All functions can be used with literal searches
switches the syntax, so that (?U)a* is lazy and
Negative lookbehind
(PERL = TRUE)
;
(?U)a*? is greedy.
using fixed = TRUE for base or by wrapping
(?<!)
position NOT following pattern
Regular expressions can be made case insensitive
patterns with fixed() for stringr.
?(
if
)
then
If-then-condition
(PERL = TRUE); use
using (?i). In backreferences, the strings can be
lookaheads, optional char. etc in if-clause
All base functions can be made case insensitive
converted to lower or upper case using \\L or \\U
?(
if
)
then
|
else
If-then-else-condition
(PERL = TRUE)
Regular expressions can conveniently be
by specifying ignore.cases = TRUE.
(e.g. \\L\\1). This requires PERL = TRUE.
*see, e.g.
created using rex::rex().
Ian Kopacka •
CC BY
ian.kopacka@ages.at
Updated: 09/16

ADVERTISEMENT

00 votes

Related Articles

Related forms

Related Categories

Parent category: Education
Go