R Programming Cheat Sheet

ADVERTISEMENT

D
S
ata
tructureS
R Programming Cheat Sheet
vEctOr
data.frame while using single-square brackets, use
‘drop’:
df1[, 'col1', drop = FALSE]
• Group of elements of the SAME type
juSt the baSicS
• R is a vectorized language, operations are applied to
data.tabLE
each element of the vector automatically
What is a data.table
• R has no concept of column vectors or row vectors
C
B
: a
C
S
C
• Extends and enhances the functionality of data.frames
reated
y
rianne
olton and
ean
hen
• Special vectors: letters and LETTERS, that contain
lower-case and upper-case letters
Differences: data.table vs. data.frame
Create Vector
v1
<- c(1, 2, 3)
• By default data.frame turns character data into factors,
G
M
S
eneral
anipulatinG
trinGS
while data.table does not
Get Length
length(v1)
• When you print data.frame data, all data prints to the
Check if All or Any is True all(v1); any(v1)
console, with a data.table, it intelligently prints the first
• R version 3.0 and greater adds support for 64 bit
paste('string1', 'string2', sep
Integer Indexing
v1[1:3]; v1[c(1,6)]
integers
and last five rows
= '/')
Boolean Indexing
v1[is.na(v1)] <- 0
• R is case sensitive
Putting
• Key Difference: Data.tables are fast because
# separator ('sep') is a space by default
c(first
= 'a', ..)or
Together
Naming
they have an index like a database.
• R index starts from 1
paste(c('1', '2'), collapse =
names(v1) <- c('first', ..)
Strings
'/')
i.e., this search,
, does a
dt1$col1
>
number
FactOr
HELP
# returns '1/2'
sequential scan (vector scan). After you create a key
stringr::str_split(string = v1,
for this, it will be much faster via binary search.
gets you the levels which is the
as.factor(v1)
help(functionName)
?functionName
pattern = '-')
or
Split String
number of unique values
Create data.table from data.frame data.table(df1)
# returns a list
• Factors can reduce the size of a variable because they
dt1[, 'col1', with
stringr::str_sub(string = v1,
Help Home Page
help.start()
Get Substring
Index by Column(s) *
only store unique values, but could be buggy if not
= FALSE] or
start = 1, end = 3)
Special Character Help
help('[')
used properly
dt1[, list(col1)]
isJohnFound
<- stringr::str_
Search Help
help.search(..)or ??..
detect(string = df1$col1,
Show info for each data.table in
List
tables()
pattern = ignore.case('John'))
memory (i.e., size, ...)
Search Function - with
apropos('mea')
Match String
Partial Name
# returns True/False if John was found
Show Keys in data.table
key(dt1)
Store any number of items of ANY type
See Example(s)
example(topic)
df1[isJohnFound, c('col1',
Create index for col1 and
setkey(dt1, col1)
Create List
list1
<-
list(first
= 'a', ...)
...)]
reorder data according to col1
vector(mode = 'list', length
dt1[c('col1Value1',
ObjEcts in current environment
Create Empty List
Use Key to Select Data
= 3)
'col1Value2'), ]
Get Element
list1[[1]] or list1[['first']]
Multiple Key Select
dt1[J('1', c('2', '3')), ]
D
t
or
Display Object Name
objects()
ls()
ata
ypeS
Append Using
dt1[,
list(col1
=
list1[[6]] <- 2
Numeric Index
mean(col1)), by =
Remove Object
rm(object1, object2,..)
col2]
Append Using Name list1[['newElement']] <- 2
Aggregation **
dt1[,
list(col1
=
Check data type:
class(variable)
Notes:
mean(col1),
col2Sum
Note: repeatedly appending to list, vector, data.frame
FOur basic data tyPEs
= sum(col2)), by =
1. .name starting with a period are accessible but
etc. is expensive, it is best to create a list of a certain
list(col3, col4)]
invisible, so they will not be found by ‘ls’
size, then fill it.
1. Numeric - includes float/double, int, etc.
Accessing columns must be done via list of actual
*
2. To guarantee memory removal, use ‘gc’, releasing
data.FramE
is.numeric(variable)
names, not as characters. If column names are
unused memory to the OS. R performs automatic ‘gc’
2. Character(string)
characters, then "with" argument should be set to
• Each column is a variable, each row is an observation
periodically
FALSE.
• Internally, each column is a vector
nchar(variable) # length of a character or numeric
symbOL NamE ENvirONmENt
• idata.frame is a data structure that creates a reference
Aggregate and d*ply functions will work, but built-in
**
3. Date/POSIXct
to a data.frame, therefore, no copying is performed
aggregation functionality of data table is faster
• Date: stores just a date. In numeric form, number
• If multiple packages use the same function name the
df1
<-
data.frame(col1
= v1,
matrix
of days since 1/1/1970 (see below).
Create Data Frame
function that the package loaded the last will get called.
col2
= v2, v3)
• Similar to data.frame except every element must be
,
date1
<- as.Date('2012-06-28')
Dimension
nrow(df1); ncol(df1); dim(df1)
the SAME type, most commonly all numerics
• To avoid this precede the function with the name of the
as.numeric(date1)
Get/Set Column
names(df1)
package. e.g.
packageName::functionName(..)
• Functions that work with data.frame should work with
Names
names(df1)
<- c(...)
• POSIXct: stores a date and time. In numeric
matrix as well
Get/Set Row
rownames(df1)
form, number of seconds since 1/1/1970.
Library
Names
rownames(df1) <- c(...)
,
# fills
matrix1
<- matrix(1:10, nrow = 5)
Create Matrix
date2
<-
as.POSIXct('2012-06-28
18:00')
Preview
head(df1, n = 10); tail(...)
rows 1 to 5, column 1 with 1:5, and column 2 with 6:10
Get Data Type
class(df1) # is data.frame
Only trust reliable R packages i.e., 'ggplot2' for plotting,
Matrix
matrix1
%*% t(matrix2)
df1['col1']or df1[1]; †
Note: Use 'lubridate' and 'chron' packages to work
Multiplication
# where t() is transpose
'sp' for dealing spatial data, 'reshape2', 'survival', etc.
Index by Column(s)
df1[c('col1', 'col3')] or
with Dates
array
df1[c(1, 3)]
or
library(packageName)
Load Package
4. Logical
Index by Rows and
df1[c(1, 3), 2:3] # returns data
• Multidimensional vector of the SAME type
require(packageName)
Columns
from row 1 & 3, columns 2 to 3
• (TRUE = 1, FALSE = 0)
array1
<- array(1:12, dim =
c(2, 3,
2))
Unload Package
detach(packageName)
• Using arrays is not recommended
• Use ==/!= to test equality and inequality
† Index method:
or
or
df1$col1
df1[, 'col1']
Note: require() returns the status(True/False)
returns as a vector. To return single column
• Matrices are restricted to two dimensions while array
df1[, 1]
as.numeric(TRUE) => 1
can have any dimension

ADVERTISEMENT

00 votes

Related Articles

Related forms

Related Categories

Parent category: Education
Go
Page of 2