本文于r format(Sys.Date(), "%Y-%m-%d")更新。 如发现问题或者有建议,欢迎提交 Issue
@Scavettaggplot1, @Scavettaggplot2, @Scavettaggplot3 给出比较系统的教程,这里单独学习并整理成文。
{r setup, include=FALSE} knitr::opts_chunk$set(eval = FALSE)
Part 1
geom_histogram
```{r}
Custom color code
myBlue <- “#377EB8 # 这个可以用Color Picker调
Change the fill color to myBlue
ggplot(mtcars, aes(x = mpg, y = ..count..)) + geom_histogram(binwidth = 1, aes(y = ..count..), fill = myBlue) ggplot(mtcars, aes(x = mpg, y = ..density..)) + geom_histogram(binwidth = 1, aes(y = ..density..), fill = myBlue)
+ `..count..`就是形容频数
+ `..density..`就是形容频率
bin的宽度为`diff(range(dataset$x))/30`
[@Scavettaggplot1, [Histograms | R](https://campus.datacamp.com/courses/data-visualization-with-ggplot2-1/chapter-4-geometries?ex=5)]
## position
```
for (i in c("stack","fill","dodge")){
p1 <-
ggplot(mtcars, aes(x = cyl, fill = factor(am))) +
geom_bar(position = i) +
theme_minimal() +
scale_fill_brewer(palette = "Diamond\nclarity")
# 加配色,不需要自己手动弄
print(p1)
}
stack: 累计,频数fill: 累计,频率, 这里显然饼状图 __有时候__更有优势。dodge: 并列 [@Scavettaggplot1, Position | R]
{r} # Convert bar chart to pie chart ggplot(mtcars, aes(x = factor(1), fill = as.factor(am))) + geom_bar(position = "fill", width = 1) + facet_grid(. ~ cyl) + coord_polar(theta = "y")
geom_rect
就是设计方块。
{r} download.file("https://assets.datacamp.com/production/course_774/datasets/recess.RData","recess.RData") load("recess.RData") ggplot(economics, aes(x = date, y = unemploy/pop)) + geom_line() + geom_rect(data = recess, inherit.aes = FALSE, aes(xmin = begin, xmax = end, ymin = -Inf, ymax = +Inf), fill = "red", alpha = 0.2)
Part 2
覆盖aes
{r} ggplot(mtcars, aes(x = wt, y = mpg, col = factor(cyl))) + geom_point() + stat_smooth(method = "lm", se = F) + stat_smooth(method = "lm", se = F, aes(group = 123))
group = 123类似于覆盖之前aes,产生一条综合的线。
geom_boxplot、geom_violin的理解
varwidth反应样本大小,
scale If "count", areas are scaled proportionally to the number of observations.
{r} diamonds %>% ggplot(aes(x = color, y = depth)) + geom_boxplot(varwidth = T) diamonds %>% ggplot(aes(x = color, y = depth)) + geom_violin(scale = "count")
geom_density的理解
按照有限的样本,每个obs的值作为分布均值,标准差假设好,然后把这么多分布全部叠加起来,组成分布,就是geom_density。
另外超出实际值的区域超出了intermediate steps,就T掉,因此看到的图,左右边是切出来的。
bandwidth就是类似于标准差,小了,假设的分布就越陡峭,那么总体的分布越容易体现出两个或两个以上的波峰。 bw > The smoothing bandwidth to be used. If numeric, the standard deviation of the smoothing kernel. If character, a rule to choose the bandwidth, as listed in bw.nrd.
用weight来加入样本大小的影响。如果同一个值有很多,当然这个的分布就要陡峭一点啊。
对于density函数。
x the n coordinates of the points where the density is estimated.
y the estimated density values. These will be non-negative, but can be zero.
```{r} # test_data is available
Calculating density: d
d <- density(diamonds$depth)
Use which.max() to calculate mode
mode <- d$x[which.max(d$y)]
Finish the ggplot call
ggplot(diamonds, aes(x = depth)) + geom_rug() + geom_density() + geom_vline(xintercept = mode, col = “red”)
这里`mode`的取法简单,值得借鉴。
```
# test_data is available
test_data = data_frame(norm = rnorm(1000))
# Arguments you'll need later on
fun_args <- list(mean = mean(test_data$norm), sd = sd(test_data$norm))
# Finish the ggplot
ggplot(test_data, aes(x = norm)) +
geom_histogram(aes(y = ..density..)) +
geom_density(col = "red") +
stat_function(fun = dnorm, args = fun_args, col = "blue")
注意看,
- histogram是样本数据(1000个)的表现,
- density红线是,KDE后叠加的,所以左边有切痕,
- density蓝线是,直接用样本数据(1000个)求得$\mu$和$\sigma$而得的正态分布。
Adjusting density plots
adjustA multiplicate bandwidth adjustment. This makes it possible to adjust the bandwidth while still using the a bandwidth estimator. For exampe,adjust = 1/2means use half of the default bandwidth. [@Scavettaggplot2, Adjusting density plots | R]
不信我们验证。
```{r} # small_data is available
Get the bandwith
get_bw <- density(test_data$norm)$bw
Basic plotting object
p <- ggplot(test_data, aes(x = norm)) + geom_rug() + coord_cartesian(ylim = c(0,0.5))
Create three plots
p + geom_density(adjust = 0.25) p + geom_density(bw = 0.25 * get_bw) # 简直一模一样。
`kernel` - kernel used for density estimation, defined as
+ `"g"` = gaussian
+ `"r"` = rectangular
+ `"t"` = triangular
+ `"e"` = epanechnikov
+ `"b"` = biweight
+ `"c"` = cosine
+ `"o"` = optcosine
## stat_density_2d
```
diamonds %>%
ggplot(aes(x = depth, y = table)) +
stat_density_2d(aes(col = ..level..), h = c(5, 0.5))
# h是两个的bandwidth,
# ..level..厉害!
geom_rug()
{r} p <- ggplot(mtcars, aes(wt, mpg)) + geom_point() p p + geom_rug()
stat_smooth
span: Smaller numbers produce wigglier lines, larger numbers produce smoother lines. span越小,窗口越小,被平均的样本就越小。 [@Scavettaggplot2, Modifying stat_smooth | R]
{r} # Plot 1: change the LOESS span for (i in c(0.2,0.7)){ p1 <- ggplot(mtcars, aes(x = wt, y = mpg)) + geom_point() + # Add span below geom_smooth(se = F, span = i) print(p1) }
```{r} # Plot 2: Set the overall model to LOESS and use a span of 0.7 ggplot(mtcars, aes(x = wt, y = mpg, col = factor(cyl))) + geom_point() + stat_smooth(method = “lm”, se = F) + # Change method and add span below stat_smooth(method = “loess”, aes(group = 1), se = F, col = “black”,span = 0.7)
Plot 3: Set col to “All”, inside the aes layer of stat_smooth()
ggplot(mtcars, aes(x = wt, y = mpg, col = factor(cyl))) + geom_point() + stat_smooth(method = “lm”, se = F) + stat_smooth(method = “loess”, # Add col inside aes() aes(group = 1,col = “All”), # Remove the col argument below # 这样就可以体现在legend中了 se = F, span = 0.7)
Plot 4: Add scale_color_manual to change the colors
library(RColorBrewer) myColors <- c(brewer.pal(3, “Dark2”), “black”) ggplot(mtcars, aes(x = wt, y = mpg, col = factor(cyl))) + geom_point() + stat_smooth(method = “lm”, se = F, span = 0.75) + stat_smooth(method = “loess”, aes(group = 1, col=“All”), se = F, span = 0.7) + # Add correct arguments to scale_color_manual scale_color_manual(“Cylinders”, values = myColors)
> `scale_color_brewer()` to use a default ColorBrewer. This should result in an error, since the default palette, "Blues", only has 9 colors, but we have 16 years here.
[@Scavettaggplot2, [Modifying stat_smooth (2) | R](https://campus.datacamp.com/courses/data-visualization-with-ggplot2-2/chapter-1-statistics?ex=5)]
Plot 1: Jittered scatter plot, add a linear model (lm) smooth
ggplot(Vocab, aes(x = education, y = vocabulary)) + geom_jitter(alpha = 0.2) + stat_smooth(method = “lm”, se = F)
Plot 2: Only lm, colored by year
ggplot(Vocab, aes(x = education, y = vocabulary, col = factor(year))) + stat_smooth(method = “lm”, se = F)
Plot 3: Set a color brewer palette
ggplot(Vocab, aes(x = education, y = vocabulary, col = factor(year))) + stat_smooth(method = “lm”, se = F) + scale_color_brewer()
Plot 4: Change col and group, specify alpha, size and geom, and add scale_color_gradient
ggplot(Vocab, aes(x = education, y = vocabulary, col = year, group = factor(year))) + stat_smooth(method = “lm”, se = F, alpha = 0.6, size = 2) + scale_color_gradientn(colors = brewer.pal(9,“YlOrRd”))
当group很多,且是interger时候,还是连续变量比factor好。
## geom_quantile
> This fits a quantile regression to the data and draws the fitted quantiles with lines. This is as a __continuous analogue__ to `geom_boxplot.`
[@Scavettaggplot2, [Quantiles | R](https://campus.datacamp.com/courses/data-visualization-with-ggplot2-2/chapter-1-statistics?ex=6)]
也就是说同一个x,拥有多个y。
```
m <- ggplot(mpg, aes(displ, 1 / hwy)) + geom_point()
m + geom_quantile()
m + geom_quantile(quantiles = 0.5)
q10 <- seq(0.05, 0.95, by = 0.05)
m + geom_quantile(quantiles = q10)
sum
Another useful stat function is
stat_sum()which calculates the count for each group. [@Scavettaggplot2, Sum | R]
range a numeric vector of length 2 that specifies the minimum and maximum size of the plotting symbol after transformation.
```{r} ggplot(mpg, aes(cty, hwy)) + geom_point()
ggplot(mpg, aes(cty, hwy)) + geom_count() # 大小体现了x和y定位的数据量大小 ggplot(mpg, aes(cty, hwy)) + geom_count() + scale_size_area() ggplot(mpg, aes(cty, hwy)) + geom_count(aes(size = ..prop..)) # 展示的是比例
第一个章节可以好好复习一下。
这些图都是可以自己画出来,但是实际上idea很重要。
## stat_summary
> `mult`是mutiple,
for `smean.cl.normal` is the multiplier of the standard error of the mean。
> `stat_summary` operates on unique x; `stat_summary_bin` operators on binned x. They are more flexible versions of `stat_bin`: instead of just counting, they can compute any aggregate.
看到`stat_function`感觉非常好!
非常自定义
```
# Display structure of mtcars
str(mtcars)
# Convert cyl and am to factors:
mtcars$cyl <- as.factor(mtcars$cyl)
mtcars$am <- as.factor(mtcars$am)
# Define positions:
posn.d <- position_dodge(width = 0.1)
posn.jd <- position_jitterdodge(jitter.width = 0.1, dodge.width = 0.2)
posn.j <- position_jitter(width = 0.2)
# base layers:
wt.cyl.am <- mtcars %>%
ggplot(aes(x = cyl, y = wt, col = am, fill = am, group = am))
wt.cyl.am +
geom_point(position = posn.jd, alpha = 0.6)
# 这个地方jitter主要是为了让点不重合。
for (i in c(mean_sdl,mean_cl_normal)){
wt.cyl.am.p <- wt.cyl.am +
stat_summary(fun.data = i,fun.args = list(mult=1),
position = posn.d) +
labs(
title = "Mean and SD",
subtitle = paste(
"这个就可以比较分析了。\n这里默认使用了geom_pointrange(),",
"使用",
substitute(i))
) +
theme(text=element_text(family="STKaiti"))
print(wt.cyl.am.p)
}
wt.cyl.am +
stat_summary(geom = "point", fun.y = mean,
position = posn.d) +
stat_summary(geom = "errorbar", fun.data = mean_sdl,
position = posn.d, fun.args = list(mult = 1), width = 0.1)
```{r} # Play vector xx is available xx <- 1:100 # Function to save range for use in ggplot: gg_range <- function(x) { # Change x below to return the instructed values data.frame(ymin = min(x), # Min ymax = max(x)) # Max }
gg_range(xx) # Required output: # ymin ymax # 1 1 100
Function to Custom function:
med_IQR <- function(x) { # Change x below to return the instructed values data.frame(y = median(x), # Median ymin = quantile(x)[2], # 1st quartile ymax = quantile(x)[4]) # 3rd quartile }
med_IQR(xx) # Required output: # y ymin ymax # 25% 50.5 25.75 75.25
```
1:100 %>%
quantile()
```{r} # The base ggplot command, you don’t have to change this wt.cyl.am <- ggplot(mtcars, aes(x = cyl,y = wt, col = am, fill = am, group = am))
Add three stat_summary calls to wt.cyl.am
wt.cyl.am + stat_summary(geom = “linerange”, fun.data = med_IQR, position = posn.d, size = 3) + stat_summary(geom = “linerange”, fun.data = gg_range, position = posn.d, size = 3, alpha = 0.4) + stat_summary(geom = “point”, fun.y = median, position = posn.d, size = 3, col = “black”, shape = “X”) + labs( subtitle = “中间的点是中位数\n深色的是四分位点\n浅色的是极值 ) + theme(text=element_text(family=“STKaiti”))
### errorbar
```
# Base layers
m <- ggplot(mtcars, aes(x = cyl,y = wt, col = as.factor(am), fill = as.factor(am)))
# Plot 1: Draw dynamite plot
m +
stat_summary(fun.y = mean, geom = "bar", alpha = 0.2) +
stat_summary(fun.data = mean_sdl, fun.args = list(mult = 1), geom = "errorbar", width = 0.1)
# Plot 2: Set position dodge in each stat function
m +
stat_summary(fun.y = mean, geom = "bar", position = "dodge", alpha = 0.2) +
stat_summary(fun.data = mean_sdl,
fun.args = list(mult = 1),
geom = "errorbar",
width = 0.1, position = "dodge")
# Set your dodge posn manually
posn.d <- position_dodge(0.9)
# Plot 3: Redraw dynamite plot
m +
stat_summary(fun.y = mean, geom = "bar", position = posn.d, alpha = 0.2) +
stat_summary(fun.data = mean_sdl, fun.args = list(mult = 1), geom = "errorbar", width = 0.1, position = posn.d)
stat_summary(fun.data = "mean_cl_normal",
geom = "crossbar",
width = 0.2,
col = "red") +
这是最容易加95%置信区间的方式了。
{r} diamonds %>% ggplot(aes(x = color, y = depth)) + geom_point() + stat_summary(fun.data = "mean_cl_normal", geom = "crossbar", width = 0.2, col = "red")
可以看到95的置信区间在很中间,因此说明这个数据很分散。
Zoom in
```{r} # Basic ggplot() command, coded for you p <- ggplot(mtcars, aes(x = wt, y = hp, col = am)) + geom_point() + geom_smooth()
Add scale_x_continuous
p + scale_x_continuous(limits = c(3,6),expand = c(0,0)) + labs( caption = “就是因为绿色的点只有两个,所以画不了线。 ) + theme(text=element_text(family=“STKaiti”))
The proper way to zoom in:
p + coord_cartesian(xlim = c(3,6)) + labs( caption = “就是因为用全集画图,也画了线,所以截图的时候,才有smooth line。 ) + theme(text=element_text(family=“STKaiti”))
`expand`
These constants ensure that the data is placed some distance away from the axes. The defaults are `c(0.05, 0)` for continuous variables, and `c(0, 0.6)` for discrete variables.
## aspect
```
# Complete basic scatter plot function
base.plot <- ggplot(iris, aes(x = Sepal.Length,
y = Sepal.Width,
col = Species)) +
geom_jitter() +
geom_smooth(method = "lm", se = F)
# Plot base.plot: default aspect ratio
base.plot + coord_fixed(ratio = 1/1) +
labs(
subtitle =
"因为minmax不一样,如果一样,就是正方形
) +
theme(text=element_text(family="STKaiti"))
# Fix aspect ratio (1:1) of base.plot
base.plot + coord_equal()
感觉上一定要学有所获才可以。
coord_polar()
We can imagine two forms for pie charts - the typical filled circle, or a colored ring.
理解极坐标
As an example, consider the stacked bar chart shown in the viewer. Imagine that we just take the y axis on the left and bend it until it loops back on itself, while expanding the right side as we go along. We’d end up with a pie chart - it’s simply a bar chart transformed onto a polar coordinate system.
```{r} # Create stacked bar plot: thin.bar thin.bar <- ggplot(mtcars, aes(x = 1, fill = as.factor(cyl))) + geom_bar() + labs( subtitle = “x轴为常数,y轴不存在,\n在x轴上stack,颜色区分,那么就是count来区分 ) + theme(text=element_text(family=“STKaiti”))
Convert thin.bar to pie chart
thin.bar + coord_polar(theta = “y”) # y轴作为极坐标 # 圆外围标记是count
Create stacked bar plot: wide.bar
wide.bar <- ggplot(mtcars, aes(x = 1, fill = as.factor(cyl))) + geom_bar(width = 1)
Convert wide.bar to pie chart
wide.bar + coord_polar(theta = “y”)
## facet
```
mtcars %>%
add_rownames() %>%
ggplot(aes(x = mpg, y = rowname)) +
geom_point() +
facet_grid(cyl ~ ., space = "free_y") +
labs(
subtitle =
"space是为了使得y轴的空间都随着样本量变化
) +
theme(text = element_text(family = "STKaiti"))
theme
这些都是ggplot一个图里面text的地方。
一共有三种调整
element_text()element_line()element_rect()
这里可以设置fill,边框设置在col
<!-- text.png -->
<!-- line.png -->
<!-- rect.png -->
<!-- summary_elment.png -->
<!-- element_blank.png -->
panel.grid
{r} mtcars %>% ggplot(aes(x = mpg, y = disp, col = as.factor(cyl))) + geom_point() + theme( panel.grid = element_blank(), # 背景颜色网格没了 panel.background = element_blank(), # 背景颜色没了 axis.line = element_line(color = "black") # 的确清爽很多 )
strip.text
{r} mtcars %>% ggplot(aes(x = mpg, y = disp)) + geom_point() + facet_grid(. ~ as.factor(cyl)) + theme( panel.grid = element_blank(), # 背景颜色网格没了 panel.background = element_blank(), # 背景颜色没了 axis.line = element_line(color = "black") # 的确清爽很多 ) + theme( strip.background = element_blank(), strip.text = element_text(face = "bold", size = 12) )
Legends
```{r} z <- mtcars %>% ggplot(aes(x = mpg, y = disp, col = as.factor(cyl))) + geom_point() + facet_grid(. ~ cyl) # Move legend by position z + theme( legend.position = c(0.85,0.85) )
Change direction
z + theme( legend.position = c(0.85,0.85), legend.direction = “horizontal )
Change location by name
z + theme( legend.position = “bottom”, legend.direction = “horizontal )
Remove legend entirely
z + theme( legend.position = “none”, legend.direction = “horizontal )
### margin
```
z
z +
theme(
panel.spacing.x = grid::unit(2,"cm")
)
z +
theme(
plot.margin = unit(c(0,0,0,0),"cm")
# 页边距
) +
labs(
subtitle = "页边距设置
) +
theme(text = element_text(family = "STKaiti"))
Get, set, and modify the active theme
The current/active theme is automatically applied to every plot you draw. Use
theme_getto get the current theme, andtheme_setto completely override it.theme_updateandtheme_replaceare shorthands for changing individual elements.
不是特别懂。
<!-- ``` -->
<!-- mtcars %>% -->
<!-- ggplot(aes(x = mpg, y = disp, col = as.factor(cyl))) + -->
<!-- geom_point() + -->
<!-- theme( -->
<!-- panel.background = element_rect(fill = "red") -->
<!-- ) -->
<!-- ``` -->
`
ggthemes
{r} library(ggthemes)
{r} mtcars %>% ggplot(aes(x = mpg, y = disp, col = as.factor(cyl))) + geom_point() + facet_grid(. ~ cyl) + labs( title = "测试中文是否能够被tufte修改 ) + theme_tufte() + theme(text = element_text(family = "STKaiti"))
```{r} # Base layers m <- ggplot(mtcars, aes(x = cyl, y = wt))
Draw dynamite plot
m + stat_summary(fun.y = mean, geom = “bar”, fill = “skyblue”) + stat_summary(fun.data = mean_sdl, fun.args = list(mult = 1), geom = “errorbar”, width = 0.1)
## GGally
这里
+ 连续变量 + 连续变量 $\to$ `geom_point`
+ 连续变量 + 分类变量 $\to$ `geom_boxplot`
+ 分类变量 + 分类变量 $\to$ `geom_point`
+ 分类变量 + 自身 $\to$ `geom_bar`
+ 连续变量 + 自身 $\to$ `geom_freploy`
```
# Parallel coordinates plot using GGally
library(GGally)
ggp <-
mtcars %>%
mutate_at(vars(cyl,am),as.factor) %>%
ggpairs()
ggp +
theme_tufte()
heat map
```{r} # Create color palette library(RColorBrewer) myColors <- brewer.pal(9, “Reds”)
Build the heat map from scratch
library(lattice) data(barley) ggplot(barley,aes(x = year, y = variety, fill = yield)) + geom_tile() + facet_wrap(~ site, ncol = 1) + scale_fill_gradientn(colors = myColors) + labs( capition = “热力图,就是用颜色来表达两个分类变量之间的关系,\n第三个连续变量的变化。 ) + theme_tufte() + theme(text = element_text(family =“STKaiti”))
## ribbon
```
barley %>%
ggplot(aes(x = year, y = yield, col = site, fill = site,group = site)) +
stat_summary(fun.y = mean, geom = "line") +
stat_summary(fun.data = mean_sdl,
fun.args = list(mult = 1),
geom = "ribbon",
alpha = 0.1,
col = NA)
{r} # Reproduce the plot ggplot(diamonds, aes(x = carat, y = price, col = color)) + geom_point(alpha = 0.5, size = 0.5, shape = 16) + scale_x_log10(expression(log[10](Carat)), limits = c(0.1,10)) + scale_y_log10(expression(log[10](Price)), limits = c(100,100000)) + scale_color_brewer(palette = "YlOrRd") + coord_equal() + theme_classic()
expression(log10) 控制比例尺
{r} diamonds %>% ggplot(aes(x = carat, y = price, col = color)) + geom_point(alpha = 0.5, size = 0.5, shape = 16) + scale_x_log10(expression(log[10](carat)), limits = c(0.1,10)) + scale_y_log10(expression(log[10](price)), # 这个expression(log[10](price)),方法很好啊。 limits = c(1000,10000)) + scale_color_brewer(palette = "YlOrRd") + # 让颜色更好看。 coord_equal() + theme_classic()
Part 3
Large dataset
alpha blending1
其实找好label,然后把cor的值放上去就好了,easy,干起来。
ggplot实现相关矩阵
```{r} cor_list <- function(x) { L <- M <- cor(x)
M[lower.tri(M, diag = TRUE)] <- NA M <- melt(M) names(M)[3] <- “points # lower.tri就是i比j大,而已。 L[upper.tri(L, diag = TRUE)] <- NA L <- melt(L) names(L)[3] <-“labels
merge(M, L) }
cor_list(iris[1:4]) # 这里的缺失值有三种 # 1. cor对角线上的 # 2. upper.tri中的一半 # 3. lower.tri中的一半
iris1 <- iris %>% group_by(Species) %>% do(cor_list(.[1:4])) # 这里相当于unnest了,比map函数方便。 iris1 %>% ggplot(aes(x = Var1, y = Var2)) + geom_point(aes(col = labels, size = abs(labels)), shape = 16) + geom_text(aes(x = Var2, y = Var1, # 这里要交叉一下, # 这样文字就在下三角了。 col = points, # size = abs(points), # size 不可以加,不然看不见 # hjust = 2, label = round(labels, 2))) + scale_size(range = c(0, 6)) + # 控制点的大小 scale_color_gradient2(“r”, limits = c(-1, 1)) + scale_y_discrete(””, limits = rev(levels(iris1$Var1))) + # rev控制了factor反着走,这样可以控制图像在上三角还是下三角 scale_x_discrete(””) + guides(size = FALSE) + # 没什么用 geom_abline(slope = -1, intercept = nlevels(iris1$Var1) + 1) + coord_fixed() + facet_grid(. ~ Species) + # 不然图像重合了很难看。 labs( caption =“数据来源:iris”, subtitle = “建立相关性矩阵很简单\n抓好x和y轴变量和计算的相关系数就好”, title = “ggplot实现相关矩阵 ) + theme_tufte() + theme(text = element_text(family =“STKaiti”)) + # 为了显示中文 theme(axis.text.y = element_text(angle = 45, hjust = 1), axis.text.x = element_text(angle = 45, hjust = 1), strip.background = element_blank())
## ggtern三角图
<!-- ternary.png -->
这个图可以表达三个变量,
$x,y,z$。
现在可以看出,下方的比例尺是$z$的。
从$z$点作垂线。
我们定义,$z$点的对边做平行线。
这些平行线上,跟比例尺相交的点,表达了数据中点的$z$值。
显然,离$z$点更近的平行线上的点,$z$值取得越高。
<!-- ternary2.png -->
这是个例子。
```
library(ggtern)
download.file("https://assets.datacamp.com/production/course_862/datasets/africa.RData","africa.RData")
load("africa.RData")
```{r} # ggtern and ggplot2 are loaded # Original plot: ggtern(africa, aes(x = Sand, y = Silt, z = Clay)) + geom_point(shape = 16, alpha = 0.2)
Plot 1
ggtern(africa, aes(x = Sand, y = Silt, z = Clay)) + geom_density_tern()
Plot 2
ggtern(africa, aes(x = Sand, y = Silt, z = Clay)) + stat_density_tern( geom = “polygon”, aes(fill = ..level.., alpha = ..level..)) + guides(fill = FALSE) # Suppress the legend
## geomnet
```
# Load geomnet & examine structure of madmen
library(geomnet)
# str(madmen)
# Merge edges and vertices
mmnet <- merge(madmen$edges, madmen$vertices,
by.x = "Name1", by.y = "label",
all = TRUE)
# Examine structure of mmnet
# str(mmnet)
madmen$edges %>% head()
madmen$vertices %>% head()
mmnet %>% head()
# Finish the ggplot command
ggplot(data = mmnet, aes(from_id = Name1, to_id = Name2)) +
geom_net(aes(col = Gender),
size = 6, linewidth = 1,
labelon = TRUE,
# 这里就打上标签了
fontsize = 3,
labelcolour = "black",
directed = TRUE) +
# 连接线上有标签
scale_color_manual(values = c("#FF69B4", "#0099ff")) +
xlim(c(-0.05, 1.05)) +
ggmap::theme_nothing(legend = T) +
# 这是很好的方法,theme_nothing
# legend = F可以保留legend
theme(legend.key = element_blank())
# 让legend的背景变透明。
shape of points
<!-- pointshape.png -->
ggfortify包
可以把base plot的图转化成ggplot的图。
利用autoplot函数,但是我还是没动leverage是干嘛的。 甚至time-series ts和multiple time-series mts也是可以的。
Distance matrices and Multi-Dimensional Scaling (MDS) | R中的 cmdscale function | R Documentation没太看得懂,没给数学:公式啊。
可视化聚类模型
cluster::clara(), cluster::fanny(), cluster::pam() 和 stats::prcomp()都是聚类模型, ggfortify可以可视化结果,方便大家理解,这里以stats::kmeans为例。
library(stats)
# use kmeans
library(ggfortify)
# Perform clustering
iris_k <- kmeans(iris[-5], centers=3)
# Autoplot: color according to cluster
autoplot(iris_k, data = iris, frame = T)
# frame = T
# draw a polygon around each cluster.
# Autoplot: above, plus shape according to species
autoplot(iris_k, data = iris, frame = T,shape ="Species")
# 显然每个框里面都有不同的品种,所以不好啊。
ggfortify安装不好,太烦。
map
A choropleth map (from Greek χ<U+03CE>ρο (“area/region”) + πλ<U+03AE>θο<U+03C2> (“multitude”)) is a thematic map.
```{r} # maps, ggplot2, and ggmap are pre-loaded # Use map_data() to create usa and inspect library(ggmap) usa <- map_data(“usa”) str(usa)
Build the map
ggplot(usa, aes(x = long, y = lat, group = group)) + geom_polygon() + # 是实现地图的关键 geom_point(aes(col = cut_number(lat,3))) + # 点根据维度划分 coord_map() + theme_nothing() # ggmap::theme_nothing
library(tidyverse) library(ggmap) get_map(location = “Shanghai”) %>% ggmap()
__由于调用的是Google地图__,可能需要翻墙,同时速度有点慢(访问的是Google地图数据库,由于网络限制数据抓取可能不完整)。
不然也非常好!!!
## gganimate
[gganimate](https://github.com/dgrtwo/gganimate)包非常适合展示图像变化。
library(ggthemes) mtcars %>% ggplot(aes(x = mpg, y = disp, col = cyl)) + geom_point() + # labs() + theme_tufte() <- p gg_animate(p, filename = “mtcars.gif”, interval = 1.0) ```
参考文献
-
blending 混合,类似于modeling blending ↩︎