Contents

1 RStudio: A Quick Tour

Panes

Options

Help

Environment, History, and Files

2 R: First Impressions

Type values and mathematical formulas into R’s command prompt

1 + 1
## [1] 2

Assign values to symbols (variables)

x = 1
x + x
## [1] 2

Invoke functions such as c(), which takes any number of values and returns a single vector

x = c(1, 2, 3)
x
## [1] 1 2 3

R functions, such as sqrt(), often operate efficienty on vectors

y = sqrt(x)
y
## [1] 1.000000 1.414214 1.732051

There are often several ways to accomplish a task in R

x = c(1, 2, 3)
x
## [1] 1 2 3
x <- c(4, 5, 6)
x
## [1] 4 5 6
x <- 7:9
x
## [1] 7 8 9
10:12 -> x
x
## [1] 10 11 12

Sometimes R does ‘surprising’ things that can be fun to figure out

x <- c(1, 2, 3) -> y
x
## [1] 1 2 3
y
## [1] 1 2 3

2.1 R Data types: vector and list

‘Atomic’ vectors

  • Types include integer, numeric (float-point; real), complex, logical, character, raw (bytes)

    people <- c("Lori", "Yubo", "Greg", "Nitesh", "Valerie", "Herve")
    people
    ## [1] "Lori" "Yubo" "Greg" "Nitesh" "Valerie" "Herve"
  • Atomic vectors can be named

    population <- c(Buffalo=259000, Rochester=210000, `New York`=8400000)
    population
    ## Buffalo Rochester New York 
    ## 259000 210000 8400000
    log10(population)
    ## Buffalo Rochester New York 
    ## 5.413300 5.322219 6.924279
  • Statistical concepts like NA ("not available")

    truthiness <- c(TRUE, FALSE, NA)
    truthiness
    ## [1] TRUE FALSE NA
  • Logical concepts like ‘and’ (&), ‘or’ (|), and ‘not’ (!)

    !truthiness
    ## [1] FALSE TRUE NA
    truthiness | !truthiness
    ## [1] TRUE TRUE NA
    truthiness & !truthiness
    ## [1] FALSE FALSE NA
  • Numerical concepts like infinity (Inf) or not-a-number (NaN, e.g., 0 / 0)

    undefined_numeric_values <- c(NA, 0/0, NaN, Inf, -Inf)
    undefined_numeric_values
    ## [1] NA NaN NaN Inf -Inf
    sqrt(undefined_numeric_values)
    ## Warning in sqrt(undefined_numeric_values): NaNs produced
    ## [1] NA NaN NaN Inf NaN
  • Common string manipulations

    toupper(people)
    ## [1] "LORI" "YUBO" "GREG" "NITESH" "VALERIE" "HERVE"
    substr(people, 1, 3)
    ## [1] "Lor" "Yub" "Gre" "Nit" "Val" "Her"
  • R is a green consumer – recylcing short vectors to align with long vectors

    x <- 1:3
    x * 2 # '2' (vector of length 1) recycled to c(2, 2, 2)
    ## [1] 2 4 6
    truthiness | NA
    ## [1] TRUE NA NA
    truthiness & NA
    ## [1] NA FALSE NA
  • It’s very common to nest operations, which can be simultaneously compact, confusing, and expressive ([: subset; <: less than)

    substr(tolower(people), 1, 3)
    ## [1] "lor" "yub" "gre" "nit" "val" "her"
    population[population < 1000000]
    ## Buffalo Rochester 
    ## 259000 210000

Lists

  • The list type can contain other vectors, including other lists

    frenemies = list(
     friends=c("Larry", "Richard", "Vivian"),
     enemies=c("Dick", "Mike")
    )
    frenemies
    ## $friends
    ## [1] "Larry" "Richard" "Vivian" 
    ## 
    ## $enemies
    ## [1] "Dick" "Mike"
  • [ subsets one list to create another list, [[ extracts a list element

    frenemies[1]
    ## $friends
    ## [1] "Larry" "Richard" "Vivian"
    frenemies[c("enemies", "friends")]
    ## $enemies
    ## [1] "Dick" "Mike"
    ## 
    ## $friends
    ## [1] "Larry" "Richard" "Vivian"
    frenemies[["enemies"]]
    ## [1] "Dick" "Mike"

Factors

  • Character-like vectors, but with values restricted to specific levels

    sex = factor(c("Male", "Male", "Female"),
     levels=c("Female", "Male", "Hermaphrodite"))
    sex
    ## [1] Male Male Female
    ## Levels: Female Male Hermaphrodite
    sex == "Female"
    ## [1] FALSE FALSE TRUE
    table(sex)
    ## sex
    ## Female Male Hermaphrodite 
    ## 1 2 0
    sex[sex == "Female"]
    ## [1] Female
    ## Levels: Female Male Hermaphrodite

2.2 Classes: data.frame and beyond

Variables are often related to one another in a highly structured way, e.g., two ‘columns’ of data in a spreadsheet

x = rnorm(1000) # 1000 random normal deviates
y = x + rnorm(1000) # another 1000 deviates, as a function of x
plot(y ~ x) # relationship bewteen x and y

Convenient to manipulate them together

  • data.frame(): like columns in a spreadsheet

    df = data.frame(X=x, Y=y)
    head(df) # first 6 rows
    ## X Y
    ## 1 0.03638925 0.5489812
    ## 2 -0.41545524 0.1326022
    ## 3 -0.07465566 -0.5745222
    ## 4 -0.54492524 1.0485564
    ## 5 1.09338400 -0.4200256
    ## 6 0.95695268 1.6142163
    plot(Y ~ X, df) # same as above
  • See all data with View(df). Summarize data with summary(df)

    summary(df)
    ## X Y 
    ## Min. :-3.907163 Min. :-4.897088 
    ## 1st Qu.:-0.671598 1st Qu.:-0.969528 
    ## Median : 0.005968 Median : 0.078811 
    ## Mean : 0.014603 Mean : 0.002155 
    ## 3rd Qu.: 0.700852 3rd Qu.: 0.991753 
    ## Max. : 3.133759 Max. : 3.973073
  • Easy to manipulate data in a coordinated way, e.g., access column X with $ and subset for just those values greater than 0

    positiveX = df[df$X > 0,]
    head(positiveX)
    ## X Y
    ## 1 0.03638925 0.5489812
    ## 5 1.09338400 -0.4200256
    ## 6 0.95695268 1.6142163
    ## 7 1.97637428 0.9102277
    ## 9 2.26707137 1.0087697
    ## 10 1.26675817 1.5059712
    plot(Y ~ X, positiveX)
  • R is introspective – ask it about itself

    class(df)
    ## [1] "data.frame"
    dim(df)
    ## [1] 1000 2
    colnames(df)
    ## [1] "X" "Y"
  • matrix() a related class, where all elements have the same type (a data.frame() requires elements within a column to be the same type, but elements between columns can be different types).

A scatterplot makes one want to fit a linear model (do a regression analysis)

  • Use a formula to describe the relationship between variables
  • Variables found in the second argument

    fit <- lm(Y ~ X, df)
  • Visualize the points, and add the regression line

    plot(Y ~ X, df)
    abline(fit, col="red", lwd=3)
  • Summarize the fit as an ANOVA table

    anova(fit)
    ## Analysis of Variance Table
    ## 
    ## Response: Y
    ## Df Sum Sq Mean Sq F value Pr(>F) 
    ## X 1 1063.5 1063.55 1058 < 2.2e-16 ***
    ## Residuals 998 1003.2 1.01 
    ## ---
    ## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
  • N.B. – ‘Type I’ sums-of-squares, so order of independent variables matters; use drop1() for ‘Type III’. See DataCamp Quick-R

  • Introspection – what class is fit? What methods can I apply to an object of that class?

    class(fit)
    ## [1] "lm"
    methods(class=class(fit))
    ## [1] add1 alias anova case.names coerce confint 
    ## [7] cooks.distance deviance dfbeta dfbetas drop1 dummy.coef 
    ## [13] effects extractAIC family formula hatvalues influence 
    ## [19] initialize kappa labels logLik model.frame model.matrix 
    ## [25] nobs plot predict print proj qr 
    ## [31] residuals rstandard rstudent show simulate slotsFromS3 
    ## [37] summary variable.names vcov 
    ## see '?methods' for accessing help and source code

2.3 Help!

Help available in Rstudio or interactively

  • Check out the help page for rnorm()

    ?rnorm
  • ‘Usage’ section describes how the function can be used

    rnorm(n, mean = 0, sd = 1)
  • Arguments, some with default values. Arguments matched first by name, then position

  • ‘Arguments’ section describes what the arguments are supposed to be

  • ‘Value’ section describes return value

  • ‘Examples’ section illustrates use

  • Often include citations to relevant technical documentation, reference to related functions, obscure details

  • Can be intimidating, but in the end actually very useful

AltStyle によって変換されたページ (->オリジナル) /