R Programming Cheat Sheet printable pdf download

ata

tructureS

R Programming Cheat Sheet

vEctOr

data.frame while using single-square brackets, use

‘drop’:

df1[, 'col1', drop = FALSE]

• Group of elements of the SAME type

juSt the baSicS

• R is a vectorized language, operations are applied to

data.tabLE

each element of the vector automatically

What is a data.table

• R has no concept of column vectors or row vectors

: a

• Extends and enhances the functionality of data.frames

reated

rianne

olton and

ean

hen

• Special vectors: letters and LETTERS, that contain

lower-case and upper-case letters

Differences: data.table vs. data.frame

Create Vector

<- c(1, 2, 3)

• By default data.frame turns character data into factors,

eneral

anipulatinG

trinGS

while data.table does not

Get Length

length(v1)

• When you print data.frame data, all data prints to the

Check if All or Any is True all(v1); any(v1)

console, with a data.table, it intelligently prints the first

• R version 3.0 and greater adds support for 64 bit

paste('string1', 'string2', sep

Integer Indexing

v1[1:3]; v1[c(1,6)]

integers

and last five rows

= '/')

Boolean Indexing

v1[is.na(v1)] <- 0

• R is case sensitive

Putting

• Key Difference: Data.tables are fast because

# separator ('sep') is a space by default

c(first

= 'a', ..)or

Together

Naming

they have an index like a database.

• R index starts from 1

paste(c('1', '2'), collapse =

names(v1) <- c('first', ..)

Strings

'/')

i.e., this search,

, does a

dt1$col1

number

FactOr

HELP

# returns '1/2'

sequential scan (vector scan). After you create a key

stringr::str_split(string = v1,

for this, it will be much faster via binary search.

•

gets you the levels which is the

as.factor(v1)

help(functionName)

?functionName

pattern = '-')

Split String

number of unique values

Create data.table from data.frame data.table(df1)

# returns a list

• Factors can reduce the size of a variable because they

dt1[, 'col1', with

stringr::str_sub(string = v1,

Help Home Page

help.start()

Get Substring

Index by Column(s) *

only store unique values, but could be buggy if not

= FALSE] or

start = 1, end = 3)

Special Character Help

help('[')

used properly

dt1[, list(col1)]

isJohnFound

<- stringr::str_

Search Help

help.search(..)or ??..

detect(string = df1$col1,

Show info for each data.table in

List

tables()

pattern = ignore.case('John'))

memory (i.e., size, ...)

Search Function - with

apropos('mea')

Match String

Partial Name

# returns True/False if John was found

Show Keys in data.table

key(dt1)

Store any number of items of ANY type

See Example(s)

example(topic)

df1[isJohnFound, c('col1',

Create index for col1 and

setkey(dt1, col1)

Create List

list1

list(first

= 'a', ...)

...)]

reorder data according to col1

vector(mode = 'list', length

dt1[c('col1Value1',

ObjEcts in current environment

Create Empty List

Use Key to Select Data

= 3)

'col1Value2'), ]

Get Element

list1[[1]] or list1[['first']]

Multiple Key Select

dt1[J('1', c('2', '3')), ]

Display Object Name

objects()

ls()

ata

ypeS

Append Using

dt1[,

list(col1

list1[[6]] <- 2

Numeric Index

mean(col1)), by =

Remove Object

rm(object1, object2,..)

col2]

Append Using Name list1[['newElement']] <- 2

Aggregation **

dt1[,

list(col1

Check data type:

class(variable)

Notes:

mean(col1),

col2Sum

Note: repeatedly appending to list, vector, data.frame

FOur basic data tyPEs

= sum(col2)), by =

1. .name starting with a period are accessible but

etc. is expensive, it is best to create a list of a certain

list(col3, col4)]

invisible, so they will not be found by ‘ls’

size, then fill it.

1. Numeric - includes float/double, int, etc.

Accessing columns must be done via list of actual

2. To guarantee memory removal, use ‘gc’, releasing

data.FramE

is.numeric(variable)

names, not as characters. If column names are

unused memory to the OS. R performs automatic ‘gc’

2. Character(string)

characters, then "with" argument should be set to

• Each column is a variable, each row is an observation

periodically

FALSE.

• Internally, each column is a vector

nchar(variable) # length of a character or numeric

symbOL NamE ENvirONmENt

• idata.frame is a data structure that creates a reference

Aggregate and d*ply functions will work, but built-in

3. Date/POSIXct

to a data.frame, therefore, no copying is performed

aggregation functionality of data table is faster

• Date: stores just a date. In numeric form, number

• If multiple packages use the same function name the

df1

data.frame(col1

= v1,

matrix

of days since 1/1/1970 (see below).

Create Data Frame

function that the package loaded the last will get called.

col2

= v2, v3)

• Similar to data.frame except every element must be

date1

<- as.Date('2012-06-28')

Dimension

nrow(df1); ncol(df1); dim(df1)

the SAME type, most commonly all numerics

• To avoid this precede the function with the name of the

as.numeric(date1)

Get/Set Column

names(df1)

package. e.g.

packageName::functionName(..)

• Functions that work with data.frame should work with

Names

names(df1)

<- c(...)

• POSIXct: stores a date and time. In numeric

matrix as well

Get/Set Row

rownames(df1)

form, number of seconds since 1/1/1970.

Library

Names

rownames(df1) <- c(...)

# fills

matrix1

<- matrix(1:10, nrow = 5)

Create Matrix

date2

as.POSIXct('2012-06-28

18:00')

Preview

head(df1, n = 10); tail(...)

rows 1 to 5, column 1 with 1:5, and column 2 with 6:10

Get Data Type

class(df1) # is data.frame

Only trust reliable R packages i.e., 'ggplot2' for plotting,

Matrix

matrix1

%*% t(matrix2)

df1['col1']or df1[1]; †

Note: Use 'lubridate' and 'chron' packages to work

Multiplication

# where t() is transpose

'sp' for dealing spatial data, 'reshape2', 'survival', etc.

Index by Column(s)

df1[c('col1', 'col3')] or

with Dates

array

df1[c(1, 3)]

library(packageName)

Load Package

4. Logical

Index by Rows and

df1[c(1, 3), 2:3] # returns data

• Multidimensional vector of the SAME type

require(packageName)

Columns

from row 1 & 3, columns 2 to 3

•

• (TRUE = 1, FALSE = 0)

array1

<- array(1:12, dim =

c(2, 3,

2))

Unload Package

detach(packageName)

• Using arrays is not recommended

• Use ==/!= to test equality and inequality

† Index method:

df1$col1

df1[, 'col1']

Note: require() returns the status(True/False)

returns as a vector. To return single column

• Matrices are restricted to two dimensions while array

df1[, 1]

as.numeric(TRUE) => 1

can have any dimension

R Programming Cheat Sheet

Related Articles

Related forms

Related Categories