5 min read

Advanced R阅读笔记

Advanced R

就从这逼格满满的话,就应该好好看。

Wickham (2014) Be comfortable reading and understanding the majority of R code. You’ll recognise common idioms (even if you wouldn’t use them your- self) and be able to critique others’ code.

看完这个,就可以写R package了。

Data structures

Vectors

Wickham (2014) Atomic vectors are usually created with c()

x <- c(a = 1, b = 2)
is.vector(x)
## [1] TRUE
y <- as.vector(x)
typeof(y)
## [1] "double"
length(y)
## [1] 2
attributes(y)
## NULL
is.atomic(y) || is.list(y) 
## [1] TRUE
# || = or
# 这个地方不太懂

Wickham (2014) NB: is.vector() does not test if an object is a vector. Instead it returns TRUE only if the object is a vector with no attributes apart from names attributes(y) == NULL.

Atomic vectors

dbl_var <- c(1, 2.5, 4.5)
# With the L suffix, you get an integer rather than a double 
int_var <- c(1L, 6L, 10L)

Atomic vectors are always flat, even if you nest c()’s:

c(1, c(2, c(3, 4)))
## [1] 1 2 3 4
c(1, 2, 3, 4)
## [1] 1 2 3 4

NA

NA will always be coerced to the correct type if used inside c(), or you can create NAs of a specific type with NA_real_ (a double vector), NA_integer_ and NA_character_.

NA_real_
## [1] NA
NA_integer_
## [1] NA
NA_character_
## [1] NA

Types and tests

is.atomic包含了 is.character(), is.double(), is.integer(), is.logical()

int_var <- c(1L, 6L, 10L)
is.integer(int_var)
## [1] TRUE
is.character(int_var)
## [1] FALSE
is.atomic(int_var)
## [1] TRUE

Lists

x <- list(1:3, "a", c(TRUE, FALSE, TRUE), c(2.3, 5.9)) 
str(x)
## List of 4
##  $ : int [1:3] 1 2 3
##  $ : chr "a"
##  $ : logi [1:3] TRUE FALSE TRUE
##  $ : num [1:2] 2.3 5.9

Lists are sometimes called recursive vectors, because a list can con- tain other lists.

x <- list(list(list(list()))) 
str(x)
## List of 1
##  $ :List of 1
##   ..$ :List of 1
##   .. ..$ : list()
is.recursive(x)
## [1] TRUE
x <- list(list(1, 2), c(3, 4))
y <- c(list(1, 2), c(3, 4))

c()是没有梯度的。

unlist() a list to c()

Lists are used to build up many of the more complicated data structures in R. For example, both data frames (described in Section 2.4) and linear models objects (as produced by lm()) are lists:

mtcars %>% is.list()
## [1] TRUE
lm(mpg ~ wt, data = mtcars) %>% is.list()
## [1] TRUE

Attributes

这个解释得很好。

Wickham (2014) Attributes can be thought of as a named list (with unique names). Attributes can be accessed individually with attr() or all at once (as a list) with attributes().

y <- 1:10
attr(y, "my_attribute") <- "This is a vector"
attr(y, "my_attribute")
## [1] "This is a vector"
str(attributes(y))
## List of 1
##  $ my_attribute: chr "This is a vector"
str(y)
##  int [1:10] 1 2 3 4 5 6 7 8 9 10
##  - attr(*, "my_attribute")= chr "This is a vector"

my_attribute这里就类似于列的名称, "This is a vector"类似于备注。

Factors

Factors are built on top of integer vectors using two attributes: the class(), “factor”, which makes them behave differently from regular integer vectors, and the levels(), which defines the set of allowed values.

所以本质上factors是integer。

Matrices and arrays

Adding a dim() attribute to an atomic vector allows it to behave like a multi-dimensional array.

c <- 1:6
c
## [1] 1 2 3 4 5 6
dim(c) <- c(2,3)
c
##      [,1] [,2] [,3]
## [1,]    1    3    5
## [2,]    2    4    6

也可以用matrixarray函数代替。

a <- matrix(1:6,nrow = 2,ncol = 3)
a
##      [,1] [,2] [,3]
## [1,]    1    3    5
## [2,]    2    4    6
b <- array(1:12,c(2,3,2))
b
## , , 1
## 
##      [,1] [,2] [,3]
## [1,]    1    3    5
## [2,]    2    4    6
## 
## , , 2
## 
##      [,1] [,2] [,3]
## [1,]    7    9   11
## [2,]    8   10   12

length() generalises to nrow() and ncol() for matrices, and dim() for arrays.

length(a)
## [1] 6
nrow(a)
## [1] 2
ncol(a)
## [1] 3
rownames(a)
## NULL
rownames(a) <- c("A","B")
rownames(a)
## [1] "A" "B"
colnames(a)
## NULL
colnames(a) <- c("a","b","c")
colnames(a)
## [1] "a" "b" "c"
length(b)
## [1] 12
dim(b)
## [1] 2 3 2
dimnames(b)
## NULL
dimnames(b) <- list(c("one", "two"), c("a", "b", "c"), c("A", "B"))
dimnames(b)
## [[1]]
## [1] "one" "two"
## 
## [[2]]
## [1] "a" "b" "c"
## 
## [[3]]
## [1] "A" "B"
b
## , , A
## 
##     a b c
## one 1 3 5
## two 2 4 6
## 
## , , B
## 
##     a  b  c
## one 7  9 11
## two 8 10 12

pp. 27

Data frames

pandas一样,R中的rownamescolnames/names是一致的。

rownames(mtcars)
##  [1] "Mazda RX4"           "Mazda RX4 Wag"       "Datsun 710"         
##  [4] "Hornet 4 Drive"      "Hornet Sportabout"   "Valiant"            
##  [7] "Duster 360"          "Merc 240D"           "Merc 230"           
## [10] "Merc 280"            "Merc 280C"           "Merc 450SE"         
## [13] "Merc 450SL"          "Merc 450SLC"         "Cadillac Fleetwood" 
## [16] "Lincoln Continental" "Chrysler Imperial"   "Fiat 128"           
## [19] "Honda Civic"         "Toyota Corolla"      "Toyota Corona"      
## [22] "Dodge Challenger"    "AMC Javelin"         "Camaro Z28"         
## [25] "Pontiac Firebird"    "Fiat X1-9"           "Porsche 914-2"      
## [28] "Lotus Europa"        "Ford Pantera L"      "Ferrari Dino"       
## [31] "Maserati Bora"       "Volvo 142E"
colnames(mtcars);names(mtcars)
##  [1] "mpg"  "cyl"  "disp" "hp"   "drat" "wt"   "qsec" "vs"   "am"   "gear"
## [11] "carb"
##  [1] "mpg"  "cyl"  "disp" "hp"   "drat" "wt"   "qsec" "vs"   "am"   "gear"
## [11] "carb"
nrow(mtcars)
## [1] 32
ncol(mtcars);length(mtcars)
## [1] 11
## [1] 11

length表达了有多少个underlying list。

data frame 是一个list,其中元素是等长的vector(不是list)。 不信的话,可以试试typeof

typeof(mtcars)
## [1] "list"
is.data.frame(mtcars)
## [1] TRUE

pp. 28

Wickham, Hadley. 2014. Advanced R. Chapman; Hall/CRC.