Advanced R
就从这逼格满满的话,就应该好好看。
Wickham (2014) Be comfortable reading and understanding the majority of R code. You’ll recognise common idioms (even if you wouldn’t use them your- self) and be able to critique others’ code.
看完这个,就可以写R package了。
Data structures
Vectors
Wickham (2014) Atomic vectors are usually created with
c()
x <- c(a = 1, b = 2)
is.vector(x)
## [1] TRUE
y <- as.vector(x)
typeof(y)
## [1] "double"
length(y)
## [1] 2
attributes(y)
## NULL
is.atomic(y) || is.list(y)
## [1] TRUE
# || = or
# 这个地方不太懂
Wickham (2014) NB:
is.vector()
does not test if an object is a vector. Instead it returns TRUE only if the object is a vector with no attributes apart from namesattributes(y) == NULL
.
Atomic vectors
dbl_var <- c(1, 2.5, 4.5)
# With the L suffix, you get an integer rather than a double
int_var <- c(1L, 6L, 10L)
Atomic vectors are always flat, even if you nest c()
’s:
c(1, c(2, c(3, 4)))
## [1] 1 2 3 4
c(1, 2, 3, 4)
## [1] 1 2 3 4
NA
NA
will always be coerced to the correct type if used inside c(), or you can create NAs of a specific type withNA_real_
(a double vector),NA_integer_
andNA_character_
.
NA_real_
## [1] NA
NA_integer_
## [1] NA
NA_character_
## [1] NA
Types and tests
is.atomic
包含了
is.character()
, is.double()
, is.integer()
, is.logical()
int_var <- c(1L, 6L, 10L)
is.integer(int_var)
## [1] TRUE
is.character(int_var)
## [1] FALSE
is.atomic(int_var)
## [1] TRUE
Lists
x <- list(1:3, "a", c(TRUE, FALSE, TRUE), c(2.3, 5.9))
str(x)
## List of 4
## $ : int [1:3] 1 2 3
## $ : chr "a"
## $ : logi [1:3] TRUE FALSE TRUE
## $ : num [1:2] 2.3 5.9
Lists are sometimes called recursive vectors, because a list can con- tain other lists.
x <- list(list(list(list())))
str(x)
## List of 1
## $ :List of 1
## ..$ :List of 1
## .. ..$ : list()
is.recursive(x)
## [1] TRUE
x <- list(list(1, 2), c(3, 4))
y <- c(list(1, 2), c(3, 4))
c()
是没有梯度的。
unlist()
a list
to c()
Lists are used to build up many of the more complicated data structures in R. For example, both data frames (described in Section 2.4) and linear models objects (as produced by lm()) are lists:
mtcars %>% is.list()
## [1] TRUE
lm(mpg ~ wt, data = mtcars) %>% is.list()
## [1] TRUE
Attributes
这个解释得很好。
Wickham (2014) Attributes can be thought of as a named list (with unique names). Attributes can be accessed individually with attr() or all at once (as a list) with
attributes()
.
y <- 1:10
attr(y, "my_attribute") <- "This is a vector"
attr(y, "my_attribute")
## [1] "This is a vector"
str(attributes(y))
## List of 1
## $ my_attribute: chr "This is a vector"
str(y)
## int [1:10] 1 2 3 4 5 6 7 8 9 10
## - attr(*, "my_attribute")= chr "This is a vector"
my_attribute
这里就类似于列的名称,
"This is a vector"
类似于备注。
Factors
Factors are built on top of integer vectors using two attributes: the
class()
, “factor”, which makes them behave differently from regular integer vectors, and thelevels()
, which defines the set of allowed values.
所以本质上factors是integer。
Matrices and arrays
Adding a
dim()
attribute to an atomic vector allows it to behave like a multi-dimensional array.
c <- 1:6
c
## [1] 1 2 3 4 5 6
dim(c) <- c(2,3)
c
## [,1] [,2] [,3]
## [1,] 1 3 5
## [2,] 2 4 6
也可以用matrix
和array
函数代替。
a <- matrix(1:6,nrow = 2,ncol = 3)
a
## [,1] [,2] [,3]
## [1,] 1 3 5
## [2,] 2 4 6
b <- array(1:12,c(2,3,2))
b
## , , 1
##
## [,1] [,2] [,3]
## [1,] 1 3 5
## [2,] 2 4 6
##
## , , 2
##
## [,1] [,2] [,3]
## [1,] 7 9 11
## [2,] 8 10 12
length()
generalises to nrow()
and ncol()
for matrices, and dim()
for arrays.
length(a)
## [1] 6
nrow(a)
## [1] 2
ncol(a)
## [1] 3
rownames(a)
## NULL
rownames(a) <- c("A","B")
rownames(a)
## [1] "A" "B"
colnames(a)
## NULL
colnames(a) <- c("a","b","c")
colnames(a)
## [1] "a" "b" "c"
length(b)
## [1] 12
dim(b)
## [1] 2 3 2
dimnames(b)
## NULL
dimnames(b) <- list(c("one", "two"), c("a", "b", "c"), c("A", "B"))
dimnames(b)
## [[1]]
## [1] "one" "two"
##
## [[2]]
## [1] "a" "b" "c"
##
## [[3]]
## [1] "A" "B"
b
## , , A
##
## a b c
## one 1 3 5
## two 2 4 6
##
## , , B
##
## a b c
## one 7 9 11
## two 8 10 12
pp. 27
Data frames
和pandas
一样,R中的rownames
和colnames
/names
是一致的。
rownames(mtcars)
## [1] "Mazda RX4" "Mazda RX4 Wag" "Datsun 710"
## [4] "Hornet 4 Drive" "Hornet Sportabout" "Valiant"
## [7] "Duster 360" "Merc 240D" "Merc 230"
## [10] "Merc 280" "Merc 280C" "Merc 450SE"
## [13] "Merc 450SL" "Merc 450SLC" "Cadillac Fleetwood"
## [16] "Lincoln Continental" "Chrysler Imperial" "Fiat 128"
## [19] "Honda Civic" "Toyota Corolla" "Toyota Corona"
## [22] "Dodge Challenger" "AMC Javelin" "Camaro Z28"
## [25] "Pontiac Firebird" "Fiat X1-9" "Porsche 914-2"
## [28] "Lotus Europa" "Ford Pantera L" "Ferrari Dino"
## [31] "Maserati Bora" "Volvo 142E"
colnames(mtcars);names(mtcars)
## [1] "mpg" "cyl" "disp" "hp" "drat" "wt" "qsec" "vs" "am" "gear"
## [11] "carb"
## [1] "mpg" "cyl" "disp" "hp" "drat" "wt" "qsec" "vs" "am" "gear"
## [11] "carb"
nrow(mtcars)
## [1] 32
ncol(mtcars);length(mtcars)
## [1] 11
## [1] 11
length
表达了有多少个underlying list。
data frame 是一个list,其中元素是等长的vector(不是list)。
不信的话,可以试试typeof
。
typeof(mtcars)
## [1] "list"
is.data.frame(mtcars)
## [1] TRUE
pp. 28
Wickham, Hadley. 2014. Advanced R. Chapman; Hall/CRC.