The Ultimate R Cheat Sheet - Data Management Page 4

ADVERTISEMENT

Loops and automation
v1=vector(length=20) initializes an empty vector with 20 elements. This is often required as an
initial statement to subsequently write results of a loop into an output vector or output table.
m1=matrix(nrow=20, ncol=10) similarly initializes an empty matrix with 20 rows and 10
columns.
for (i in 1:10) {
one or more operations with v1[i] or m1[,i] }
for (i in 1:10) { for (j in 1:20) { one or more operations with m1[j,i] }}
Example for an application, where a for-loop is used to calculate cumulative values. Copy and paste
the code below into R to see what it does.
dat=round(rnorm(10)+2)
cum=vector(length=10)
cum[1]=dat[1]
for (i in 2:length(dat)) { cum[i]=cum[i-1]+dat[i] }
cbind(dat,cum)
Example for an application where a for-loop allows automatic data processing of multiple files in a
directory. This batch-converts DBF format files into CSV format files. With similar code, you could
merge a large number of files into one master file, or do manipulations or analysis on multiple files
consecutively.
library(foreign)
setwd("C:/your path/")
a<-list.files(); a
for (name in a) { dat1=read.dbf(name)
write.csv(dat1, paste(name,".csv"), row.names=F, quote=F) }
Handy built-in functions
paste("hello", "world") joins vectors after converting them to characters. The sep="" option
can place any character string or nothing between values (a single space is the default)
substr("Year 1998",6,9) extracts characters from start to stop position from vector
tolower("Year 1998") convert to lowercase - handy to correct inconsistencies in data entry.
toupper("Year 1998") convert to uppercase
nchar("Year 1998") number of characters in a string, allows you to substring the last four digits
of a variable regardless of length, for example: substr(VAR1,nchar(VAR1)-3,nchar(VAR1))
Plenty of math functions, of course: log(VAR1), log10(VAR1), log(VAR1,2), exp(VAR1),
sqrt(VAR1), abs(VAR1), round(VAR1,2)
Programming custom functions
You can program your own functions, if something is missing, or if you want to utilize a bunch of code
over and over to make similar calculations. Here is a clever example for calculating the statistical
mode of a variable, which is missing from the built-in R functions.
mode=function(input){ freq=table(as.vector(input))
descending_freq=sort(-freq)
mode=names(descending_freq)[1]
as.numeric(mode)
}
VAR1=c(1,3,3,2,3,2,2,3,5,3)
mode(VAR1)
More information, help, and on-line resources
Adding a question mark before a command or functions brings up a help file. E.g. ?paste. Be sure to
check out the example code at the end of the help file, which often helps to understand the syntax.
More information and R resources can be found with the search engine

ADVERTISEMENT

00 votes

Related Articles

Related forms

Related Categories

Parent category: Education
Go
Page of 4