The Ultimate R Cheat Sheet - Data Management Page 2

ADVERTISEMENT

Creating random data and random sampling
rnorm(10) takes 10 random samples from a normal distribution with a mean of zero and a standard
deviation of 1
runif(10) takes 10 random samples from a uniform distribution between zero and one.
round(rnorm(10)*3+15)) takes 10 random samples from a normal distribution with a mean of 15
and a standard deviation of 3, and with decimals removed by the rounding function.
round(runif(10)*5+15) returns random integers between 15 and 20, uniformly distributed.
sample(c("A","B","C"), 10, replace=TRUE) returns a random sample from any custom
vector or variable with replacement.
sample1=dat1[sample(1:nrow(dat1), 50, replace=FALSE),] takes 50 random rows from
dat1 (without duplicate sampling). This can be handy for bootstrapping or to run quick test analyses
on subsets of very large datasets.
Sub-setting data tables, conditional subsets
dat1[1:10, 1:5] returns the first 10 rows and the first 5 columns of table dat1.
dat2=dat1[50:70,] returns a subset of rows 50 to 70.
cleandata=dat1[-c(2,7,15),] removes rows 2, 7 and 15.
selectvars=dat1[,c("ID","YIELD")] sub-sets the variables ID and YIELD
selectrows=dat1[dat1$VAR1=="Site 1",] sub-sets entries that were measured at Site 1.
Possible conditional operators are == equal, != non-equal, > larger, < smaller, >= larger or equal, <=
smaller or equal, & AND, | OR, ! NOT, () brackets to order complex conditional statements.
selecttreats=dat1[dat1$TREAT %in% c("CTRL", "N", "P", "K"),] can replace
multiple conditional == statements linked together by OR.
Transforming variables in data tables, conditional transformations
dat2=transform(dat1, VAR1=VAR1*0.4). Multiplies VAR1 by 0.4
dat2=transform(dat1, VAR2=VAR1*2). Creates variable VAR2 that is twice the value of VAR1
dat2=transform(dat1, VAR1=ifelse(VAR3=="Site 1", VAR1*0.4, VAR1)) Multiplies
VAR1 by 0.1 for entries measured at Site 1. For other sites the value stays the same. The general
format is ifelse(condition, value if true, value if false).
The vegan package offers many useful standard transformations for variables or an entire data table:
dat2=decostand(dat1,"standardize") Check out ?decostand to see all transformations.
Merging data tables
dat3=merge(dat1,dat2,by="ID") merge two tables by ID field.
dat3=merge(dat1,dat2,by.x="ID",by.y="STN") merge by an ID field that is differently
named in the two datasets.
dat3=merge(dat1,dat2,by=c("LAT","LONG")) merge by multiple ID fields.
dat3=merge(dat1,dat2,by.x="ID",by.y="ID",all.x=T,all.y=F) left merge; all.x=F,
all.y=T right merge; all.x=T,all.y=T keep all rows; all.x=F,all.y=F keep matching rows.
cbind(dat1,dat2) On very rare occasions, you merge data without a criteria (ID). This is generally
dangerous, because the commands will slap the two tables together without checking the order!
dat3=rbind(dat1,dat2) adding rows of two data tables. The variables have to match exactly and
you will get error messages if they don’t match. So, unlike cbind(), rbind() is generally safe to use.
dat3=rbind.fill(dat1,dat2) will force non-matching datasets together, filling missing values
and executing variable type conversions where appropriate. Requires the reshape package.
Summary statistics for variables and tables
mean()
weighted.mean(,) median()
max()
min()
range()
which.max()
which.min() var()
sd()
quantile() quantile(,c(0.025,0.05,0.95,0.975))
rank(x) some descriptive statistical functions for variables or vectors. For all functions, and

ADVERTISEMENT

00 votes

Related Articles

Related forms

Related Categories

Parent category: Education
Go
Page of 4