15 min read

"技术：实战技巧

2018/05/30

本文于r format(Sys.Date(), "%Y-%m-%d")更新。如发现问题或者有建议，欢迎提交 Issue

@Scavettaggplot1, @Scavettaggplot2, @Scavettaggplot3 给出比较系统的教程，这里单独学习并整理成文。

{r setup, include=FALSE} knitr::opts_chunk$set(eval = FALSE)

Part 1

geom_histogram

```{r}

Custom color code

myBlue <- “#377EB8 # 这个可以用Color Picker调

Change the fill color to myBlue

ggplot(mtcars, aes(x = mpg, y = ..count..)) + geom_histogram(binwidth = 1, aes(y = ..count..), fill = myBlue) ggplot(mtcars, aes(x = mpg, y = ..density..)) + geom_histogram(binwidth = 1, aes(y = ..density..), fill = myBlue)

+ `..count..`就是形容频数
+ `..density..`就是形容频率

bin的宽度为`diff(range(dataset$x))/30`
[@Scavettaggplot1, [Histograms | R](https://campus.datacamp.com/courses/data-visualization-with-ggplot2-1/chapter-4-geometries?ex=5)]

## position


```
for (i in c("stack","fill","dodge")){
  p1 <- 
  ggplot(mtcars, aes(x = cyl, fill = factor(am))) + 
    geom_bar(position = i) +
    theme_minimal() +
    scale_fill_brewer(palette = "Diamond\nclarity")
    # 加配色，不需要自己手动弄
    print(p1)
}

stack: 累计，频数
fill: 累计，频率，这里显然饼状图 __有时候__更有优势。
dodge: 并列 [@Scavettaggplot1, Position | R]

{r} # Convert bar chart to pie chart ggplot(mtcars, aes(x = factor(1), fill = as.factor(am))) + geom_bar(position = "fill", width = 1) + facet_grid(. ~ cyl) + coord_polar(theta = "y")

geom_rect

就是设计方块。

{r} download.file("https://assets.datacamp.com/production/course_774/datasets/recess.RData","recess.RData") load("recess.RData") ggplot(economics, aes(x = date, y = unemploy/pop)) + geom_line() + geom_rect(data = recess, inherit.aes = FALSE, aes(xmin = begin, xmax = end, ymin = -Inf, ymax = +Inf), fill = "red", alpha = 0.2)

Part 2

覆盖aes

{r} ggplot(mtcars, aes(x = wt, y = mpg, col = factor(cyl))) + geom_point() + stat_smooth(method = "lm", se = F) + stat_smooth(method = "lm", se = F, aes(group = 123))

group = 123类似于覆盖之前aes，产生一条综合的线。

geom_boxplot、geom_violin的理解

varwidth反应样本大小，

scale If "count", areas are scaled proportionally to the number of observations.

{r} diamonds %>% ggplot(aes(x = color, y = depth)) + geom_boxplot(varwidth = T) diamonds %>% ggplot(aes(x = color, y = depth)) + geom_violin(scale = "count")

geom_density的理解

按照有限的样本，每个obs的值作为分布均值，标准差假设好，然后把这么多分布全部叠加起来，组成分布，就是geom_density。

另外超出实际值的区域超出了intermediate steps，就T掉，因此看到的图，左右边是切出来的。

bandwidth就是类似于标准差，小了，假设的分布就越陡峭，那么总体的分布越容易体现出两个或两个以上的波峰。 bw > The smoothing bandwidth to be used. If numeric, the standard deviation of the smoothing kernel. If character, a rule to choose the bandwidth, as listed in bw.nrd.

用weight来加入样本大小的影响。如果同一个值有很多，当然这个的分布就要陡峭一点啊。

对于density函数。

x the n coordinates of the points where the density is estimated.

y the estimated density values. These will be non-negative, but can be zero.

```{r} # test_data is available

Calculating density: d

d <- density(diamonds$depth)

Use which.max() to calculate mode

mode <- d$x[which.max(d$y)]

Finish the ggplot call

ggplot(diamonds, aes(x = depth)) + geom_rug() + geom_density() + geom_vline(xintercept = mode, col = “red”)

这里`mode`的取法简单，值得借鉴。

```
# test_data is available
test_data = data_frame(norm = rnorm(1000))
# Arguments you'll need later on
fun_args <- list(mean = mean(test_data$norm), sd = sd(test_data$norm))

# Finish the ggplot
ggplot(test_data, aes(x = norm)) +
  geom_histogram(aes(y = ..density..)) +
  geom_density(col = "red") +
  stat_function(fun = dnorm, args = fun_args, col = "blue")

注意看，

histogram是样本数据（1000个）的表现，
density红线是，KDE后叠加的，所以左边有切痕，
density蓝线是，直接用样本数据（1000个）求得$\mu$和$\sigma$而得的正态分布。

Adjusting density plots

adjust A multiplicate bandwidth adjustment. This makes it possible to adjust the bandwidth while still using the a bandwidth estimator. For exampe, adjust = 1/2 means use half of the default bandwidth. [@Scavettaggplot2, Adjusting density plots | R]

不信我们验证。

```{r} # small_data is available

Get the bandwith

get_bw <- density(test_data$norm)$bw

Basic plotting object

p <- ggplot(test_data, aes(x = norm)) + geom_rug() + coord_cartesian(ylim = c(0,0.5))

Create three plots

p + geom_density(adjust = 0.25) p + geom_density(bw = 0.25 * get_bw) # 简直一模一样。

`kernel` - kernel used for density estimation, defined as

+ `"g"` = gaussian
+ `"r"` = rectangular
+ `"t"` = triangular
+ `"e"` = epanechnikov
+ `"b"` = biweight
+ `"c"` = cosine
+ `"o"` = optcosine

## stat_density_2d

```
diamonds %>% 
  ggplot(aes(x = depth, y = table)) + 
    stat_density_2d(aes(col = ..level..), h = c(5, 0.5))
# h是两个的bandwidth，
# ..level..厉害！

geom_rug()

{r} p <- ggplot(mtcars, aes(wt, mpg)) + geom_point() p p + geom_rug()

stat_smooth

span: Smaller numbers produce wigglier lines, larger numbers produce smoother lines. span越小，窗口越小，被平均的样本就越小。 [@Scavettaggplot2, Modifying stat_smooth | R]

{r} # Plot 1: change the LOESS span for (i in c(0.2,0.7)){ p1 <- ggplot(mtcars, aes(x = wt, y = mpg)) + geom_point() + # Add span below geom_smooth(se = F, span = i) print(p1) }

```{r} # Plot 2: Set the overall model to LOESS and use a span of 0.7 ggplot(mtcars, aes(x = wt, y = mpg, col = factor(cyl))) + geom_point() + stat_smooth(method = “lm”, se = F) + # Change method and add span below stat_smooth(method = “loess”, aes(group = 1), se = F, col = “black”,span = 0.7)

Plot 3: Set col to “All”, inside the aes layer of stat_smooth()

ggplot(mtcars, aes(x = wt, y = mpg, col = factor(cyl))) + geom_point() + stat_smooth(method = “lm”, se = F) + stat_smooth(method = “loess”, # Add col inside aes() aes(group = 1,col = “All”), # Remove the col argument below # 这样就可以体现在legend中了 se = F, span = 0.7)

Plot 4: Add scale_color_manual to change the colors

library(RColorBrewer) myColors <- c(brewer.pal(3, “Dark2”), “black”) ggplot(mtcars, aes(x = wt, y = mpg, col = factor(cyl))) + geom_point() + stat_smooth(method = “lm”, se = F, span = 0.75) + stat_smooth(method = “loess”, aes(group = 1, col=“All”), se = F, span = 0.7) + # Add correct arguments to scale_color_manual scale_color_manual(“Cylinders”, values = myColors)

> `scale_color_brewer()` to use a default ColorBrewer. This should result in an error, since the default palette, "Blues", only has 9 colors, but we have 16 years here.
[@Scavettaggplot2, [Modifying stat_smooth (2) | R](https://campus.datacamp.com/courses/data-visualization-with-ggplot2-2/chapter-1-statistics?ex=5)]

Plot 1: Jittered scatter plot, add a linear model (lm) smooth

ggplot(Vocab, aes(x = education, y = vocabulary)) + geom_jitter(alpha = 0.2) + stat_smooth(method = “lm”, se = F)

Plot 2: Only lm, colored by year

ggplot(Vocab, aes(x = education, y = vocabulary, col = factor(year))) + stat_smooth(method = “lm”, se = F)

Plot 3: Set a color brewer palette

ggplot(Vocab, aes(x = education, y = vocabulary, col = factor(year))) + stat_smooth(method = “lm”, se = F) + scale_color_brewer()

Plot 4: Change col and group, specify alpha, size and geom, and add scale_color_gradient

ggplot(Vocab, aes(x = education, y = vocabulary, col = year, group = factor(year))) + stat_smooth(method = “lm”, se = F, alpha = 0.6, size = 2) + scale_color_gradientn(colors = brewer.pal(9,“YlOrRd”))

当group很多，且是interger时候，还是连续变量比factor好。

## geom_quantile


> This fits a quantile regression to the data and draws the fitted quantiles with lines. This is as a __continuous analogue__ to `geom_boxplot.`
[@Scavettaggplot2, [Quantiles | R](https://campus.datacamp.com/courses/data-visualization-with-ggplot2-2/chapter-1-statistics?ex=6)]

也就是说同一个x，拥有多个y。

```
m <- ggplot(mpg, aes(displ, 1 / hwy)) + geom_point()
m + geom_quantile()
m + geom_quantile(quantiles = 0.5)
q10 <- seq(0.05, 0.95, by = 0.05)
m + geom_quantile(quantiles = q10)

sum

Another useful stat function is stat_sum() which calculates the count for each group. [@Scavettaggplot2, Sum | R]

range a numeric vector of length 2 that specifies the minimum and maximum size of the plotting symbol after transformation.

```{r} ggplot(mpg, aes(cty, hwy)) + geom_point()

ggplot(mpg, aes(cty, hwy)) + geom_count() # 大小体现了x和y定位的数据量大小 ggplot(mpg, aes(cty, hwy)) + geom_count() + scale_size_area() ggplot(mpg, aes(cty, hwy)) + geom_count(aes(size = ..prop..)) # 展示的是比例

第一个章节可以好好复习一下。
这些图都是可以自己画出来，但是实际上idea很重要。

## stat_summary

> `mult`是mutiple，   
for `smean.cl.normal` is the multiplier of the standard error of the mean。

> `stat_summary` operates on unique x; `stat_summary_bin` operators on binned x. They are more flexible versions of `stat_bin`: instead of just counting, they can compute any aggregate.

看到`stat_function`感觉非常好！
非常自定义

```
# Display structure of mtcars
str(mtcars)

# Convert cyl and am to factors:
mtcars$cyl <- as.factor(mtcars$cyl)
mtcars$am <- as.factor(mtcars$am)

# Define positions:
posn.d <- position_dodge(width = 0.1) 
posn.jd <- position_jitterdodge(jitter.width = 0.1, dodge.width = 0.2) 
posn.j <- position_jitter(width = 0.2) 

# base layers:
wt.cyl.am <- mtcars %>% 
  ggplot(aes(x = cyl, y = wt, col = am, fill = am, group = am))
wt.cyl.am +
  geom_point(position = posn.jd, alpha = 0.6)
  # 这个地方jitter主要是为了让点不重合。
for (i in c(mean_sdl,mean_cl_normal)){
wt.cyl.am.p <- wt.cyl.am +
  stat_summary(fun.data = i,fun.args = list(mult=1),
               position = posn.d) +
  labs(
    title = "Mean and SD",
    subtitle = paste(
      "这个就可以比较分析了。\n这里默认使用了geom_pointrange(),",
      "使用",
      substitute(i))
  ) + 
  theme(text=element_text(family="STKaiti"))
print(wt.cyl.am.p)
}
wt.cyl.am +
  stat_summary(geom = "point", fun.y = mean,
               position = posn.d) +
  stat_summary(geom = "errorbar", fun.data = mean_sdl,
               position = posn.d, fun.args = list(mult = 1), width = 0.1)

```{r} # Play vector xx is available xx <- 1:100 # Function to save range for use in ggplot: gg_range <- function(x) { # Change x below to return the instructed values data.frame(ymin = min(x), # Min ymax = max(x)) # Max }

gg_range(xx) # Required output: # ymin ymax # 1 1 100

Function to Custom function:

med_IQR <- function(x) { # Change x below to return the instructed values data.frame(y = median(x), # Median ymin = quantile(x)[2], # 1st quartile ymax = quantile(x)[4]) # 3rd quartile }

med_IQR(xx) # Required output: # y ymin ymax # 25% 50.5 25.75 75.25

```
1:100 %>% 
  quantile()

```{r} # The base ggplot command, you don’t have to change this wt.cyl.am <- ggplot(mtcars, aes(x = cyl,y = wt, col = am, fill = am, group = am))

Add three stat_summary calls to wt.cyl.am

wt.cyl.am + stat_summary(geom = “linerange”, fun.data = med_IQR, position = posn.d, size = 3) + stat_summary(geom = “linerange”, fun.data = gg_range, position = posn.d, size = 3, alpha = 0.4) + stat_summary(geom = “point”, fun.y = median, position = posn.d, size = 3, col = “black”, shape = “X”) + labs( subtitle = “中间的点是中位数\n深色的是四分位点\n浅色的是极值 ) + theme(text=element_text(family=“STKaiti”))

### errorbar

```
# Base layers
m <- ggplot(mtcars, aes(x = cyl,y = wt, col = as.factor(am), fill = as.factor(am)))

# Plot 1: Draw dynamite plot
m +
  stat_summary(fun.y = mean, geom = "bar", alpha = 0.2) +
  stat_summary(fun.data = mean_sdl, fun.args = list(mult = 1), geom = "errorbar", width = 0.1)

# Plot 2: Set position dodge in each stat function
m +
  stat_summary(fun.y = mean, geom = "bar", position = "dodge", alpha = 0.2) +
  stat_summary(fun.data = mean_sdl, 
               fun.args = list(mult = 1), 
               geom = "errorbar", 
               width = 0.1, position = "dodge")

# Set your dodge posn manually
posn.d <- position_dodge(0.9)

# Plot 3:  Redraw dynamite plot
m +
  stat_summary(fun.y = mean, geom = "bar", position = posn.d, alpha = 0.2) +
  stat_summary(fun.data = mean_sdl, fun.args = list(mult = 1), geom = "errorbar", width = 0.1, position = posn.d)

stat_summary(fun.data = "mean_cl_normal",
           geom = "crossbar",
           width = 0.2,
           col = "red") +

这是最容易加95%置信区间的方式了。

{r} diamonds %>% ggplot(aes(x = color, y = depth)) + geom_point() + stat_summary(fun.data = "mean_cl_normal", geom = "crossbar", width = 0.2, col = "red")

可以看到95的置信区间在很中间，因此说明这个数据很分散。

Zoom in

```{r} # Basic ggplot() command, coded for you p <- ggplot(mtcars, aes(x = wt, y = hp, col = am)) + geom_point() + geom_smooth()

Add scale_x_continuous

p + scale_x_continuous(limits = c(3,6),expand = c(0,0)) + labs( caption = “就是因为绿色的点只有两个，所以画不了线。 ) + theme(text=element_text(family=“STKaiti”))

The proper way to zoom in:

p + coord_cartesian(xlim = c(3,6)) + labs( caption = “就是因为用全集画图，也画了线，所以截图的时候，才有smooth line。 ) + theme(text=element_text(family=“STKaiti”))

`expand`
These constants ensure that the data is placed some distance away from the axes. The defaults are `c(0.05, 0)` for continuous variables, and `c(0, 0.6)` for discrete variables.

## aspect

```
# Complete basic scatter plot function
base.plot <- ggplot(iris, aes(x = Sepal.Length, 
                              y = Sepal.Width,
                              col = Species)) +
               geom_jitter() +
               geom_smooth(method = "lm", se = F)

# Plot base.plot: default aspect ratio
base.plot + coord_fixed(ratio = 1/1)  +
  labs(
    subtitle = 
      "因为minmax不一样，如果一样，就是正方形
  ) +
theme(text=element_text(family="STKaiti"))  

# Fix aspect ratio (1:1) of base.plot
base.plot + coord_equal()

感觉上一定要学有所获才可以。

coord_polar()

We can imagine two forms for pie charts - the typical filled circle, or a colored ring.

理解极坐标

As an example, consider the stacked bar chart shown in the viewer. Imagine that we just take the y axis on the left and bend it until it loops back on itself, while expanding the right side as we go along. We’d end up with a pie chart - it’s simply a bar chart transformed onto a polar coordinate system.

```{r} # Create stacked bar plot: thin.bar thin.bar <- ggplot(mtcars, aes(x = 1, fill = as.factor(cyl))) + geom_bar() + labs( subtitle = “x轴为常数，y轴不存在，\n在x轴上stack，颜色区分，那么就是count来区分 ) + theme(text=element_text(family=“STKaiti”))

Convert thin.bar to pie chart

thin.bar + coord_polar(theta = “y”) # y轴作为极坐标 # 圆外围标记是count

Create stacked bar plot: wide.bar

wide.bar <- ggplot(mtcars, aes(x = 1, fill = as.factor(cyl))) + geom_bar(width = 1)

Convert wide.bar to pie chart

wide.bar + coord_polar(theta = “y”)

## facet

```
mtcars %>% 
  add_rownames() %>% 
  ggplot(aes(x = mpg, y = rowname)) +
  geom_point() +
  facet_grid(cyl ~ ., space = "free_y") +
  labs(
    subtitle = 
      "space是为了使得y轴的空间都随着样本量变化
  ) +
  theme(text = element_text(family = "STKaiti"))

theme

这些都是ggplot一个图里面text的地方。

一共有三种调整

element_text()
element_line()
element_rect()

这里可以设置fill，边框设置在col

<!-- text.png -->

<!-- line.png -->

<!-- rect.png -->

<!-- summary_elment.png -->

<!-- element_blank.png -->

panel.grid

{r} mtcars %>% ggplot(aes(x = mpg, y = disp, col = as.factor(cyl))) + geom_point() + theme( panel.grid = element_blank(), # 背景颜色网格没了 panel.background = element_blank(), # 背景颜色没了 axis.line = element_line(color = "black") # 的确清爽很多 )

strip.text

{r} mtcars %>% ggplot(aes(x = mpg, y = disp)) + geom_point() + facet_grid(. ~ as.factor(cyl)) + theme( panel.grid = element_blank(), # 背景颜色网格没了 panel.background = element_blank(), # 背景颜色没了 axis.line = element_line(color = "black") # 的确清爽很多 ) + theme( strip.background = element_blank(), strip.text = element_text(face = "bold", size = 12) )

Legends

```{r} z <- mtcars %>% ggplot(aes(x = mpg, y = disp, col = as.factor(cyl))) + geom_point() + facet_grid(. ~ cyl) # Move legend by position z + theme( legend.position = c(0.85,0.85) )

Change direction

z + theme( legend.position = c(0.85,0.85), legend.direction = “horizontal )

Change location by name

z + theme( legend.position = “bottom”, legend.direction = “horizontal )

Remove legend entirely

z + theme( legend.position = “none”, legend.direction = “horizontal )

### margin

```
z
z + 
  theme(
    panel.spacing.x = grid::unit(2,"cm")
  )
z + 
  theme(
    plot.margin = unit(c(0,0,0,0),"cm")
    # 页边距
  ) +
  labs(
    subtitle = "页边距设置
  ) +
  theme(text = element_text(family = "STKaiti"))

Get, set, and modify the active theme

The current/active theme is automatically applied to every plot you draw. Use theme_get to get the current theme, and theme_set to completely override it. theme_update and theme_replace are shorthands for changing individual elements.

不是特别懂。

<!-- ``` -->

<!-- mtcars %>%  -->

<!--   ggplot(aes(x = mpg, y = disp, col = as.factor(cyl))) + -->

<!--     geom_point() + -->

<!--   theme( -->

<!--     panel.background = element_rect(fill = "red") -->

<!--   ) -->

<!-- ``` -->

ggthemes

{r} library(ggthemes)

{r} mtcars %>% ggplot(aes(x = mpg, y = disp, col = as.factor(cyl))) + geom_point() + facet_grid(. ~ cyl) + labs( title = "测试中文是否能够被tufte修改 ) + theme_tufte() + theme(text = element_text(family = "STKaiti"))

```{r} # Base layers m <- ggplot(mtcars, aes(x = cyl, y = wt))

Draw dynamite plot

m + stat_summary(fun.y = mean, geom = “bar”, fill = “skyblue”) + stat_summary(fun.data = mean_sdl, fun.args = list(mult = 1), geom = “errorbar”, width = 0.1)

## GGally

这里

+ 连续变量 + 连续变量 $\to$ `geom_point`
+ 连续变量 + 分类变量 $\to$ `geom_boxplot`
+ 分类变量 + 分类变量 $\to$ `geom_point`
+ 分类变量 + 自身 $\to$ `geom_bar`
+ 连续变量 + 自身 $\to$ `geom_freploy`

```
# Parallel coordinates plot using GGally
library(GGally)
ggp <- 
  mtcars %>% 
  mutate_at(vars(cyl,am),as.factor) %>% 
  ggpairs()
ggp +
  theme_tufte()

heat map

```{r} # Create color palette library(RColorBrewer) myColors <- brewer.pal(9, “Reds”)

Build the heat map from scratch

library(lattice) data(barley) ggplot(barley,aes(x = year, y = variety, fill = yield)) + geom_tile() + facet_wrap(~ site, ncol = 1) + scale_fill_gradientn(colors = myColors) + labs( capition = “热力图，就是用颜色来表达两个分类变量之间的关系，\n第三个连续变量的变化。 ) + theme_tufte() + theme(text = element_text(family =“STKaiti”))

## ribbon


```
barley %>% 
  ggplot(aes(x = year, y = yield, col = site, fill = site,group = site)) +
  stat_summary(fun.y = mean, geom = "line") +
  stat_summary(fun.data = mean_sdl,
               fun.args = list(mult = 1),
               geom = "ribbon",
               alpha = 0.1,
               col = NA)

{r} # Reproduce the plot ggplot(diamonds, aes(x = carat, y = price, col = color)) + geom_point(alpha = 0.5, size = 0.5, shape = 16) + scale_x_log10(expression(log[10](Carat)), limits = c(0.1,10)) + scale_y_log10(expression(log[10](Price)), limits = c(100,100000)) + scale_color_brewer(palette = "YlOrRd") + coord_equal() + theme_classic()

expression(log10) 控制比例尺

{r} diamonds %>% ggplot(aes(x = carat, y = price, col = color)) + geom_point(alpha = 0.5, size = 0.5, shape = 16) + scale_x_log10(expression(log[10](carat)), limits = c(0.1,10)) + scale_y_log10(expression(log[10](price)), # 这个expression(log[10](price))，方法很好啊。 limits = c(1000,10000)) + scale_color_brewer(palette = "YlOrRd") + # 让颜色更好看。 coord_equal() + theme_classic()

Part 3

Large dataset

alpha blending¹

其实找好label，然后把cor的值放上去就好了，easy，干起来。

ggplot实现相关矩阵

```{r} cor_list <- function(x) { L <- M <- cor(x)

M[lower.tri(M, diag = TRUE)] <- NA M <- melt(M) names(M)[3] <- “points # lower.tri就是i比j大，而已。 L[upper.tri(L, diag = TRUE)] <- NA L <- melt(L) names(L)[3] <-“labels

merge(M, L) }

cor_list(iris[1:4]) # 这里的缺失值有三种 # 1. cor对角线上的 # 2. upper.tri中的一半 # 3. lower.tri中的一半

iris1 <- iris %>% group_by(Species) %>% do(cor_list(.[1:4])) # 这里相当于unnest了，比map函数方便。 iris1 %>% ggplot(aes(x = Var1, y = Var2)) + geom_point(aes(col = labels, size = abs(labels)), shape = 16) + geom_text(aes(x = Var2, y = Var1, # 这里要交叉一下， # 这样文字就在下三角了。 col = points, # size = abs(points), # size 不可以加，不然看不见 # hjust = 2, label = round(labels, 2))) + scale_size(range = c(0, 6)) + # 控制点的大小 scale_color_gradient2(“r”, limits = c(-1, 1)) + scale_y_discrete(””, limits = rev(levels(iris1$Var1))) + # rev控制了factor反着走，这样可以控制图像在上三角还是下三角 scale_x_discrete(””) + guides(size = FALSE) + # 没什么用 geom_abline(slope = -1, intercept = nlevels(iris1$Var1) + 1) + coord_fixed() + facet_grid(. ~ Species) + # 不然图像重合了很难看。 labs( caption =“数据来源:iris”, subtitle = “建立相关性矩阵很简单\n抓好x和y轴变量和计算的相关系数就好”, title = “ggplot实现相关矩阵 ) + theme_tufte() + theme(text = element_text(family =“STKaiti”)) + # 为了显示中文 theme(axis.text.y = element_text(angle = 45, hjust = 1), axis.text.x = element_text(angle = 45, hjust = 1), strip.background = element_blank())

## ggtern三角图

<!-- ternary.png -->

这个图可以表达三个变量，
$x,y,z$。
现在可以看出，下方的比例尺是$z$的。
从$z$点作垂线。
我们定义，$z$点的对边做平行线。
这些平行线上，跟比例尺相交的点，表达了数据中点的$z$值。
显然，离$z$点更近的平行线上的点，$z$值取得越高。

<!-- ternary2.png -->

这是个例子。

```
library(ggtern)
download.file("https://assets.datacamp.com/production/course_862/datasets/africa.RData","africa.RData")
load("africa.RData")

```{r} # ggtern and ggplot2 are loaded # Original plot: ggtern(africa, aes(x = Sand, y = Silt, z = Clay)) + geom_point(shape = 16, alpha = 0.2)

Plot 1

ggtern(africa, aes(x = Sand, y = Silt, z = Clay)) + geom_density_tern()

Plot 2

ggtern(africa, aes(x = Sand, y = Silt, z = Clay)) + stat_density_tern( geom = “polygon”, aes(fill = ..level.., alpha = ..level..)) + guides(fill = FALSE) # Suppress the legend

## geomnet

```
# Load geomnet & examine structure of madmen
library(geomnet)
# str(madmen)

# Merge edges and vertices
mmnet <- merge(madmen$edges, madmen$vertices,
               by.x = "Name1", by.y = "label",
               all = TRUE)

# Examine structure of mmnet
# str(mmnet)
madmen$edges %>% head()
madmen$vertices %>% head()
mmnet %>% head()

# Finish the ggplot command
ggplot(data = mmnet, aes(from_id = Name1, to_id = Name2)) +
  geom_net(aes(col = Gender),
  size = 6, linewidth = 1, 
  labelon = TRUE, 
  # 这里就打上标签了
  fontsize = 3, 
  labelcolour = "black",
  directed = TRUE) +
  # 连接线上有标签
  scale_color_manual(values = c("#FF69B4", "#0099ff")) +
  xlim(c(-0.05, 1.05)) +
  ggmap::theme_nothing(legend = T) +
  # 这是很好的方法，theme_nothing
  # legend = F可以保留legend
  theme(legend.key = element_blank())
  # 让legend的背景变透明。

shape of points

<!-- pointshape.png -->

`ggfortify`包

可以把base plot的图转化成ggplot的图。

利用autoplot函数，但是我还是没动leverage是干嘛的。甚至time-series ts和multiple time-series mts也是可以的。

Distance matrices and Multi-Dimensional Scaling (MDS) | R中的 cmdscale function | R Documentation没太看得懂，没给数学：公式啊。

可视化聚类模型

cluster::clara(), cluster::fanny(), cluster::pam() 和 stats::prcomp()都是聚类模型， ggfortify可以可视化结果，方便大家理解，这里以stats::kmeans为例。

library(stats)
# use kmeans
library(ggfortify)
# Perform clustering
iris_k <- kmeans(iris[-5], centers=3)

# Autoplot: color according to cluster
autoplot(iris_k, data = iris, frame = T)
# frame = T
# draw a polygon around each cluster.

# Autoplot: above, plus shape according to species
autoplot(iris_k, data = iris, frame = T,shape ="Species")
# 显然每个框里面都有不同的品种，所以不好啊。

ggfortify安装不好，太烦。

map

A choropleth map (from Greek χ<U+03CE>ρο (“area/region”) + πλ<U+03AE>θο<U+03C2> (“multitude”)) is a thematic map.

```{r} # maps, ggplot2, and ggmap are pre-loaded # Use map_data() to create usa and inspect library(ggmap) usa <- map_data(“usa”) str(usa)

Build the map

ggplot(usa, aes(x = long, y = lat, group = group)) + geom_polygon() + # 是实现地图的关键 geom_point(aes(col = cut_number(lat,3))) + # 点根据维度划分 coord_map() + theme_nothing() # ggmap::theme_nothing

library(tidyverse) library(ggmap) get_map(location = “Shanghai”) %>% ggmap()

__由于调用的是Google地图__，可能需要翻墙，同时速度有点慢（访问的是Google地图数据库，由于网络限制数据抓取可能不完整）。

不然也非常好！！！

## gganimate

[gganimate](https://github.com/dgrtwo/gganimate)包非常适合展示图像变化。

library(ggthemes) mtcars %>% ggplot(aes(x = mpg, y = disp, col = cyl)) + geom_point() + # labs() + theme_tufte() <- p gg_animate(p, filename = “mtcars.gif”, interval = 1.0) ```

参考文献

blending 混合，类似于modeling blending ↩︎