1 RStudio: A Quick Tour

Panes

Options

Help

Environment, History, and Files

2 R: First Impressions

Type values and mathematical formulas into R’s command prompt

1 + 1

## [1] 2

Assign values to symbols (variables)

x = 1
x + x

## [1] 2

Invoke functions such as c(), which takes any number of values and returns a single vector

x = c(1, 2, 3)
x

## [1] 1 2 3

R functions, such as sqrt(), often operate efficienty on vectors

y = sqrt(x)
y

## [1] 1.000000 1.414214 1.732051

There are often several ways to accomplish a task in R

x = c(1, 2, 3)
x

## [1] 1 2 3

x <- c(4, 5, 6)
x

## [1] 4 5 6

x <- 7:9
x

## [1] 7 8 9

10:12 -> x
x

## [1] 10 11 12

Sometimes R does ‘surprising’ things that can be fun to figure out

x <- c(1, 2, 3) -> y
x

## [1] 1 2 3

## [1] 1 2 3

2.1 R Data types: vector and list

‘Atomic’ vectors

Types include integer, numeric (float-point; real), complex, logical, character, raw (bytes)

people <- c("Lori", "Yubo", "Greg", "Nitesh", "Valerie", "Herve")
people

## [1] "Lori" "Yubo" "Greg" "Nitesh" "Valerie" "Herve"

Atomic vectors can be named

population <- c(Buffalo=259000, Rochester=210000, `New York`=8400000)
population

## Buffalo Rochester New York 
## 259000 210000 8400000

log10(population)

## Buffalo Rochester New York 
## 5.413300 5.322219 6.924279

Statistical concepts like NA ("not available")

truthiness <- c(TRUE, FALSE, NA)
truthiness

## [1] TRUE FALSE NA

Logical concepts like ‘and’ (&), ‘or’ (|), and ‘not’ (!)

!truthiness

## [1] FALSE TRUE NA

truthiness | !truthiness

## [1] TRUE TRUE NA

truthiness & !truthiness

## [1] FALSE FALSE NA

Numerical concepts like infinity (Inf) or not-a-number (NaN, e.g., 0 / 0)

undefined_numeric_values <- c(NA, 0/0, NaN, Inf, -Inf)
undefined_numeric_values

## [1] NA NaN NaN Inf -Inf

sqrt(undefined_numeric_values)

## Warning in sqrt(undefined_numeric_values): NaNs produced

## [1] NA NaN NaN Inf NaN

Common string manipulations

toupper(people)

## [1] "LORI" "YUBO" "GREG" "NITESH" "VALERIE" "HERVE"

substr(people, 1, 3)

## [1] "Lor" "Yub" "Gre" "Nit" "Val" "Her"

R is a green consumer – recylcing short vectors to align with long vectors

x <- 1:3
x * 2 # '2' (vector of length 1) recycled to c(2, 2, 2)

## [1] 2 4 6

truthiness | NA

## [1] TRUE NA NA

truthiness & NA

## [1] NA FALSE NA

It’s very common to nest operations, which can be simultaneously compact, confusing, and expressive ([: subset; <: less than)

substr(tolower(people), 1, 3)

## [1] "lor" "yub" "gre" "nit" "val" "her"

population[population < 1000000]

## Buffalo Rochester 
## 259000 210000

Lists

The list type can contain other vectors, including other lists

frenemies = list(
 friends=c("Larry", "Richard", "Vivian"),
 enemies=c("Dick", "Mike")
)
frenemies

## $friends
## [1] "Larry" "Richard" "Vivian" 
## 
## $enemies
## [1] "Dick" "Mike"

[ subsets one list to create another list, [[ extracts a list element

frenemies[1]

## $friends
## [1] "Larry" "Richard" "Vivian"

frenemies[c("enemies", "friends")]

## $enemies
## [1] "Dick" "Mike"
## 
## $friends
## [1] "Larry" "Richard" "Vivian"

frenemies[["enemies"]]

## [1] "Dick" "Mike"

Factors

Character-like vectors, but with values restricted to specific levels

sex = factor(c("Male", "Male", "Female"),
 levels=c("Female", "Male", "Hermaphrodite"))
sex

## [1] Male Male Female
## Levels: Female Male Hermaphrodite

sex == "Female"

## [1] FALSE FALSE TRUE

table(sex)

## sex
## Female Male Hermaphrodite 
## 1 2 0

sex[sex == "Female"]

## [1] Female
## Levels: Female Male Hermaphrodite

2.2 Classes: data.frame and beyond

Variables are often related to one another in a highly structured way, e.g., two ‘columns’ of data in a spreadsheet

x = rnorm(1000) # 1000 random normal deviates
y = x + rnorm(1000) # another 1000 deviates, as a function of x
plot(y ~ x) # relationship bewteen x and y

Convenient to manipulate them together

data.frame(): like columns in a spreadsheet

df = data.frame(X=x, Y=y)
head(df) # first 6 rows

## X Y
## 1 0.03638925 0.5489812
## 2 -0.41545524 0.1326022
## 3 -0.07465566 -0.5745222
## 4 -0.54492524 1.0485564
## 5 1.09338400 -0.4200256
## 6 0.95695268 1.6142163

plot(Y ~ X, df) # same as above

See all data with View(df). Summarize data with summary(df)

summary(df)

## X Y 
## Min. :-3.907163 Min. :-4.897088 
## 1st Qu.:-0.671598 1st Qu.:-0.969528 
## Median : 0.005968 Median : 0.078811 
## Mean : 0.014603 Mean : 0.002155 
## 3rd Qu.: 0.700852 3rd Qu.: 0.991753 
## Max. : 3.133759 Max. : 3.973073

Easy to manipulate data in a coordinated way, e.g., access column X with $ and subset for just those values greater than 0

positiveX = df[df$X > 0,]
head(positiveX)

## X Y
## 1 0.03638925 0.5489812
## 5 1.09338400 -0.4200256
## 6 0.95695268 1.6142163
## 7 1.97637428 0.9102277
## 9 2.26707137 1.0087697
## 10 1.26675817 1.5059712

plot(Y ~ X, positiveX)

R is introspective – ask it about itself

class(df)

## [1] "data.frame"

dim(df)

## [1] 1000 2

colnames(df)

## [1] "X" "Y"

matrix() a related class, where all elements have the same type (a data.frame() requires elements within a column to be the same type, but elements between columns can be different types).

A scatterplot makes one want to fit a linear model (do a regression analysis)

Use a formula to describe the relationship between variables
Variables found in the second argument
```
fit <- lm(Y ~ X, df)
```
Visualize the points, and add the regression line
```
plot(Y ~ X, df)
abline(fit, col="red", lwd=3)
```

Summarize the fit as an ANOVA table

anova(fit)

## Analysis of Variance Table
## 
## Response: Y
## Df Sum Sq Mean Sq F value Pr(>F) 
## X 1 1063.5 1063.55 1058 < 2.2e-16 ***
## Residuals 998 1003.2 1.01 
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

N.B. – ‘Type I’ sums-of-squares, so order of independent variables matters; use drop1() for ‘Type III’. See DataCamp Quick-R

Introspection – what class is fit? What methods can I apply to an object of that class?

class(fit)

## [1] "lm"

methods(class=class(fit))

## [1] add1 alias anova case.names coerce confint 
## [7] cooks.distance deviance dfbeta dfbetas drop1 dummy.coef 
## [13] effects extractAIC family formula hatvalues influence 
## [19] initialize kappa labels logLik model.frame model.matrix 
## [25] nobs plot predict print proj qr 
## [31] residuals rstandard rstudent show simulate slotsFromS3 
## [37] summary variable.names vcov 
## see '?methods' for accessing help and source code

2.3 Help!

Help available in Rstudio or interactively

Check out the help page for rnorm()
```
?rnorm
```
‘Usage’ section describes how the function can be used
```
rnorm(n, mean = 0, sd = 1)
```
Arguments, some with default values. Arguments matched first by name, then position
‘Arguments’ section describes what the arguments are supposed to be
‘Value’ section describes return value
‘Examples’ section illustrates use
Often include citations to relevant technical documentation, reference to related functions, obscure details
Can be intimidating, but in the end actually very useful

A.1 – Using R

Martin Morgan Martin.Morgan@RoswellPark.org
Lori Shepherd Lori.Shepherd@RoswellPark.org

2 March 2017

Contents

1 RStudio: A Quick Tour

2 R: First Impressions

2.1 R Data types: vector and list

2.2 Classes: data.frame and beyond

2.3 Help!

A.1 – Using R

Martin Morgan Martin.Morgan@RoswellPark.org Lori Shepherd Lori.Shepherd@RoswellPark.org

2 March 2017

Contents

1 RStudio: A Quick Tour

2 R: First Impressions

2.1 R Data types: vector and list

2.2 Classes: data.frame and beyond

2.3 Help!

Martin Morgan Martin.Morgan@RoswellPark.org
Lori Shepherd Lori.Shepherd@RoswellPark.org