The Ultimate R Cheat Sheet - Data Management (Page 2 of 4) in pdf

Creating random data and random sampling



rnorm(10) takes 10 random samples from a normal distribution with a mean of zero and a standard

deviation of 1



runif(10) takes 10 random samples from a uniform distribution between zero and one.



round(rnorm(10)*3+15)) takes 10 random samples from a normal distribution with a mean of 15

and a standard deviation of 3, and with decimals removed by the rounding function.



round(runif(10)*5+15) returns random integers between 15 and 20, uniformly distributed.



sample(c("A","B","C"), 10, replace=TRUE) returns a random sample from any custom

vector or variable with replacement.



sample1=dat1[sample(1:nrow(dat1), 50, replace=FALSE),] takes 50 random rows from

dat1 (without duplicate sampling). This can be handy for bootstrapping or to run quick test analyses

on subsets of very large datasets.

Sub-setting data tables, conditional subsets



dat1[1:10, 1:5] returns the first 10 rows and the first 5 columns of table dat1.



dat2=dat1[50:70,] returns a subset of rows 50 to 70.



cleandata=dat1[-c(2,7,15),] removes rows 2, 7 and 15.



selectvars=dat1[,c("ID","YIELD")] sub-sets the variables ID and YIELD



selectrows=dat1[dat1$VAR1=="Site 1",] sub-sets entries that were measured at Site 1.

Possible conditional operators are == equal, != non-equal, > larger, < smaller, >= larger or equal, <=

smaller or equal, & AND, | OR, ! NOT, () brackets to order complex conditional statements.



selecttreats=dat1[dat1$TREAT %in% c("CTRL", "N", "P", "K"),] can replace

multiple conditional == statements linked together by OR.

Transforming variables in data tables, conditional transformations



dat2=transform(dat1, VAR1=VAR1*0.4). Multiplies VAR1 by 0.4



dat2=transform(dat1, VAR2=VAR1*2). Creates variable VAR2 that is twice the value of VAR1



dat2=transform(dat1, VAR1=ifelse(VAR3=="Site 1", VAR1*0.4, VAR1)) Multiplies

VAR1 by 0.1 for entries measured at Site 1. For other sites the value stays the same. The general

format is ifelse(condition, value if true, value if false).



The vegan package offers many useful standard transformations for variables or an entire data table:

dat2=decostand(dat1,"standardize") Check out ?decostand to see all transformations.

Merging data tables



dat3=merge(dat1,dat2,by="ID") merge two tables by ID field.



dat3=merge(dat1,dat2,by.x="ID",by.y="STN") merge by an ID field that is differently

named in the two datasets.



dat3=merge(dat1,dat2,by=c("LAT","LONG")) merge by multiple ID fields.



dat3=merge(dat1,dat2,by.x="ID",by.y="ID",all.x=T,all.y=F) left merge; all.x=F,

all.y=T right merge; all.x=T,all.y=T keep all rows; all.x=F,all.y=F keep matching rows.



cbind(dat1,dat2) On very rare occasions, you merge data without a criteria (ID). This is generally

dangerous, because the commands will slap the two tables together without checking the order!



dat3=rbind(dat1,dat2) adding rows of two data tables. The variables have to match exactly and

you will get error messages if they don’t match. So, unlike cbind(), rbind() is generally safe to use.



dat3=rbind.fill(dat1,dat2) will force non-matching datasets together, filling missing values

and executing variable type conversions where appropriate. Requires the reshape package.

Summary statistics for variables and tables



mean()

weighted.mean(,) median()

max()

min()

range()

which.max()

which.min() var()

sd()

quantile() quantile(,c(0.025,0.05,0.95,0.975))

rank(x) some descriptive statistical functions for variables or vectors. For all functions, and

The Ultimate R Cheat Sheet - Data Management Page 2

Related Articles

Related forms

Related Categories