16 min read

ggplot2使用技巧

本文于2020-10-10更新。 如发现问题或者有建议,欢迎提交 Issue

1 主要内容

  1. 数据(data)
  2. 映射(mapping)aesthetic attributes, 包括颜色、形状、大小等
  3. 几何对象(geom)geometric object, 包括点、线、条形等
  4. 统计变换(stat)statistical transformation
  5. 坐标系(coord)coordinate system
  6. 分面(facet)
  7. 标度(scale)
  8. 主题(theme)
  9. 位置(position)
  10. 存储和输出

结构参考 安建才 (2018)

2 新增

  • geom_segment可以在图中做很多箭头的改进
  • How to draw an arrow in the middle of the segment?

以数据mtcars为例。

library(skimr)
skim(mtcars)
head(mtcars)

3 data

%+%更换数据

library(tidyverse)
p <- 
  mtcars %>% 
    ggplot(aes(mpg, wt, col = cyl)) +
      geom_point()
p
p %+%
  (mtcars %>% mutate(mpg = mpg^2))

注意看x轴的比例尺子都变换了。

4 mapping

4.1 mapping 和 geom_*不同之处

mtcars %>% 
  ggplot(aes(mpg, wt, col = 'red')) +
      geom_point()
mtcars %>% 
  ggplot(aes(mpg, wt)) +
      geom_point(col = 'red')

注意看到第一幅图,因为是对data进行color,因此系统默认这是一个legend,然而在geom_*中的设定是不考虑的,因此没有legend

4.2 cut_*

  • cut_interval(x, n)分成n个bin,其中每个bin的长度相等。
  • cut_width(x, width)分bin,bin的长度为width
  • cut_number(x, n = 10)分10个bin,bin之间数量近似相等

4.2.1 cut_number的优化

cut_number makes n groups with (approximately) equal numbers of observations. 因此不好。 这里bin = round(fst_givenamount/500)*500给到的是,离值最近的500的bin。 这些都不好,而是把非常高的剔除来,\(n-2\)的分箱。

有时候某个值占比特别高,比如30%,切10bins时根本不可能靠cut_number提取。 可以把这个值单独做一个bin,然后,再取用8bins。

5 geom

5.1 误差线比较好的总结

参考 http://mp.weixin.qq.com/s/lGVgCsm1sMRdVOUJWiJQTg

离散型变量+区间: geom_errorbar(), geom_linerange() 离散型变量+区间+中间值: geom_crossbar(), geom_pointrange() 连续型变量+区间: geom_ribbon() 连续型变量+区间+中间值: geom_smooth(stat="identity")

5.1.1 geom_crossbar

data.frame(
  trt = factor(c(1, 1, 2, 2)),
  resp = c(1, 5, 3, 4),
  group = factor(c(1, 2, 1, 2)),
  upper = c(1.1, 5.3, 3.3, 4.2),
  lower = c(0.8, 4.6, 2.4, 3.6)
) %>% 
ggplot(aes(trt, resp, colour = group)) + 
  geom_crossbar(aes(ymin = lower, ymax = upper), width = 0.2)

也就是均值上下加上标准差之类的误差量。 非常直观,适合数据处理完后的展示。

5.1.2 geom_errorbarh

geom_ \(\to\) error \(\to\) bar \(\to\) h

df <- data.frame(
  trt = factor(c(1, 1, 2, 2)),
  resp = c(1, 5, 3, 4),
  group = factor(c(1, 2, 1, 2)),
  se = c(0.1, 0.3, 0.3, 0.2)
)

# Define the top and bottom of the errorbars

p <- ggplot(df, aes(resp, trt, colour = group))
p + geom_point() +
  geom_errorbarh(aes(xmax = resp + se, xmin = resp - se))

5.1.2.1 psych更好的选择

可以多变量展示数据,具体参考 因子分析 Factor Analysis

library(psych)
library(data.table)
gcbs <- 
    fread('../../../picbackup/gcbs.csv'
          ,encoding = 'UTF-8') %>% 
    as.data.frame()
    # error.dots 否则 error.dots 报错
    # Error in `[.data.frame`(x, i) : undefined columns selected
error.bars(gcbs)

error.dots(gcbs,var=names(gcbs))

5.1.3 geom_ribbon

LakeHuron
# Generate data
huron <- data.frame(year = 1875:1972, level = as.vector(LakeHuron))
h <- ggplot(huron, aes(year))

# Add aesthetic mappings
h +
  geom_ribbon(aes(ymin = level - 1, ymax = level + 1), fill = "grey70") +
  geom_line(aes(y = level))

这就是欧巴需要的误差曲线图。

5.1.4 加入虚线

geom_line(aes(linetype=supp))

5.1.5 geom_segment

library(tidyverse)
# 生成测试数据
df=data.frame(
  Phylum=c("Ruminococcaceae","Bacteroidaceae","Eubacteriaceae","Lachnospiraceae","Porphyromonadaceae"),
  GroupA=c(37.7397,31.34317,222.08827,5.08956,3.7393),
  GroupB=c(113.2191,94.02951,66.26481,15.26868,11.2179)
)

# 计算连线起始点Y轴坐标,即累计丰度的值
link_dat <- df %>% 
  arrange(by=desc(Phylum)) %>% 
  mutate(GroupA=cumsum(GroupA), GroupB=cumsum(GroupB)) 

# 数据格式转换,宽表格转换为ggplot2使用的长表格
df.long <- df %>% gather(group, abundance, -Phylum)
# 或者使用reshape2的melt函数
# df.long <- reshape2::melt(df, value.name='abundance', variable.name='group')

# 绘图,堆叠柱状图+组间连线
ggplot(df.long, aes(x=group, y=abundance, fill=Phylum)) + 
  geom_bar(stat = "identity", width=0.5, col='black')  + 
  geom_segment(data=link_dat, aes(x=1.25, xend=1.75, y=GroupA, yend=GroupB))

cumsum配合geom_bar()中的stat = "identity"使用,表达累积状态。 geom_segment作为连线,x的坐标跟geom_bar中的width设置有关, 第一个等于1+width/2, 第二个等于2-width/2。

参考: 堆叠柱状图各成分连线画法:突出展示组间物种丰度变化

# 画三个组间比较
library(reshape2)

# 读生一个测试数据宽表格
df=data.frame(
  Phylum=c("Ruminococcaceae","Bacteroidaceae","Eubacteriaceae","Lachnospiraceae","Porphyromonadaceae"),
  GroupA=c(37.7397,31.34317,222.08827,5.08956,3.7393),
  GroupB=c(113.2191,94.02951,66.26481,15.26868,11.2179),
  GroupC=c(123.2191,94.02951,46.26481,35.26868,1.2179)
)

# melt转换为长表格为ggplot2绘图通用格式
# geom_segment添加直线和曲线,arrange按门水平名称字母降序排列;cumsum先将数值累计,再用mutate取代;现在己有两组间的高度位置,再设置X轴位置1.25, 1.75, 和Y位置
ggplot(melt(df), aes(x=variable, y=value, fill=Phylum)) + 
  geom_bar(stat = "identity", width=0.5, col='black')  + theme_classic()+
  geom_segment(data=df %>% arrange(by=desc(Phylum)) %>% mutate(GroupA=cumsum(GroupA)) %>% mutate(GroupB=cumsum(GroupB)), aes(x=1.25, xend=1.75, y=GroupA, yend=GroupB))+ 
  geom_segment(data=df %>% arrange(by=desc(Phylum)) %>% mutate(GroupB=cumsum(GroupB)) %>% mutate(GroupC=cumsum(GroupC)), aes(x=2.25, xend=2.75, y=GroupB, yend=GroupC))
# 添加theme_classic()修改主题样式,这个经典主题我更喜欢
# x和xend分别为起始和终止,1,2组间X值起始分别为1.25和1.75,2,3组间则为2.25和2.75

多加一条geom_segment完成三组比较。

# 三组或更多组的画法,只需添加数据即可
library(tidyverse)

df <- data.frame(
  Phylum=c("Ruminococcaceae","Bacteroidaceae","Eubacteriaceae","Lachnospiraceae","Porphyromonadaceae"),
  GroupA=c(37.7397,31.34317,222.08827,5.08956,3.7393),
  GroupB=c(113.2191,94.02951,66.26481,15.26868,11.2179),
  GroupC=c(123.2191,94.02951,46.26481,35.26868,1.2179),
  GroupD=c(37.7397,31.34317,222.08827,5.08956,3.7393)
)

df.long <- df %>% gather(group, abundance, -Phylum)

# 组间连线数据:
# 假设第一列是Phylum
link_dat <- df %>% 
  arrange(by=desc(Phylum)) %>% 
  mutate_if(is.numeric, cumsum) 
bar.width <- 0.7
link_dat <- link_dat[, c(1,2,rep(3:(ncol(link_dat)-1),each=2), ncol(link_dat))]
link_dat <- data.frame(y=t(matrix(t(link_dat[,-1]), nrow=2)))
link_dat$x.1 <- 1:(ncol(df)-2)+bar.width/2
link_dat$x.2 <- 1:(ncol(df)-2)+(1-bar.width/2)

ggplot(df.long, aes(x=group, y=abundance, fill=Phylum)) + 
  geom_bar(stat = "identity", width=bar.width, col='black')  + 
  geom_segment(data=link_dat,
               aes(x=x.1, xend=x.2, y=y.1, yend=y.2), inherit.aes = F)

这是多组的方案。

5.1.6 geom_errorbar

ggplot中插入geom_errorbar 可以帮助我们,表现一次回归函数的,均值和方差。

library(tidyverse)

df <- data.frame(x = 1:10,
                 y = 1:10,
                 ymin = (1:10) - runif(10),
                 ymax = (1:10) + runif(10),
                 xmin = (1:10) - runif(10),
                 xmax = (1:10) + runif(10))

p <- ggplot(data = df,aes(x = x,y = y)) + 
    geom_point() + 
    geom_errorbar(aes(ymin = ymin,ymax = ymax)) + 
    geom_errorbarh(aes(xmin = xmin,xmax = xmax))

p

5.1.7 How to draw an arrow in the middle of the segment?

arrow_data <- data.frame(x1 = 2.62, x2 = 3.57, y1 = 21.0, y2 = 15.0)
mtcars %>%
  ggplot(aes(x = wt, y = mpg)) +
    geom_point() +
    geom_curve(
      aes(x = x1, y = y1, xend = x2, yend = y2, col = "red"), data = arrow_data
      ) +
    geom_segment(
      aes(x = x1, y = y1, xend = x2, yend = y2, col = "green"), arrow = arrow(), data = arrow_data
      )

下面的方式是把箭头放在中间,原理就是把两条线分两半进行重建。

mtcars %>%
  ggplot(aes(x = wt, y = mpg)) +
    geom_point() +
    geom_curve(
      aes(x = x1, y = y1, xend = x2, yend = y2, col = "curve"), data = arrow_data
      ) +
    geom_segment(
      aes(x = (x1+x2)/2, y = (y1+y2)/2, xend = x2, yend = y2, col = "segment"), data = arrow_data
      ) +
    geom_segment(
      aes(x = x1, y = y1, xend = (x1+x2)/2, yend = (y1+y2)/2, col = "segment"), arrow = arrow(), data = arrow_data, show.legend=FALSE
      )

5.1.8 geom_segment可以在图中做很多箭头的改进

@online{Dancho2017Product,
  author = {Matt Dancho},
  title = {Sales Analytics: How To Use Machine Learning To Predict And Optimize Product Backorders},
  year = 2017,
  url = {http://www.business-science.io/business/2017/10/16/sales_backorder_prediction.html},
  urldate = {2018-05-02}
}
mtcars %>%
  ggplot(aes(x = mpg, y = disp, col = factor(cyl))) +
  geom_point() +
  geom_segment(aes(x= 12.5, y = 400, xend = 12.5+0.5, yend = 400+20), size = 1) +
  annotate("text", x = 15-1, y = 420, size = 3, label = "蓝\n色")

5.2 ggplot交互

鼠标点击图标,会有解释。 参考Make ‘ggplot2’ Graphics Interactive • ggiraph package

ggiraph(
(mtcars %>% 
  ggplot(aes(x = mpg, y = disp, col = as.factor(cyl))) + 
    geom_point_interactive(aes(tooltip = paste("cyl = ",cyl)), size = 2)
) %>% print()
)

tooltip鼠标点击的标签。

5.3 aes_stringnames在ggplot中的使用

mlp_p_add_15 <- mlp_p
for (i in 1:15){
  mlp_p_add_15 <- mlp_p_add_15 + 
    geom_line(data = mlp_model_pred_table, 
              aes_string(y = names(mlp_model_pred_table)[i]))
}
mlp_p_add_15

5.4 geom_ma()

tidyquant(Dancho and Vaughan 2018) 提供了geom_ma函数,方便画移动平滑线。

library(tidyquant)
AAPL <- tq_get("AAPL")
AAPL %>%
ggplot(aes(x = date, y```` = adjusted)) +
geom_line() + ## Plot stock price
geom_ma(n = 50) + ## Plot 50-day Moving Average
geom_ma(n = 200, color = "red") + ## Plot 200-day Moving Average
coord_x_date(xlim = c(today() - weeks(12), today()),
ylim = c(100, 130)) ## Zoom in

5.5 geom_scatterpie()

杜雨 (2018) 采用scatterpie包的geom_scatterpie函数构建多个饼状图。

library(tidyverse)
library(scatterpie)
library(Cairo)
pie_data <- data.table::data.table(
    Year = 2004:2011,
    Dummy = seq(5,40,5),
    Data = rnorm(n = 8,mean = 10,sd = 5),
    S1 = c(1,1,3,1,2,3,4,2),
    S2 = c(1,1,2,1,2,2,2,1),
    S3 = c(1,1,3,1,4,3,4,1),
    S4 = c(1,1,5,1,5,5,2,.5),
    S5 = c(1,2,5,1,1,5,1,.5)
)
color1<-c("#FF2D2D","#F79646","#4BACC6","#FFC000","#92D050")
color2<-c("#17375E","#23538D","#558ED5","#8EB4E3","#C6D9F1")
pie_data %>%
    ggplot() +
    geom_line(aes(x=Dummy,y=Data,group=1),col="#085264",size=.8) +
    geom_scatterpie(data=pie_data,aes(x=Dummy,y=Data,r=Dummy/10),cols=colnames(pie_data)[4:8],color=NA) +
    # ylim(0,25)+
    scale_fill_manual(values=color1) +
    scale_x_continuous(breaks=pie_data$Dummy,labels=c(2004:2011)) +
    guides( fill=guide_legend(label.position ="top"))+
    theme(
        axis.title=element_blank(),
        legend.title=element_blank(),
        panel.background=element_blank(),
        axis.line=element_line(),
        axis.ticks=element_line(),
        legend.direction="horizontal",
        legend.position=c(0.15,0.9),
    ) +
    coord_equal()
  • ylim(0,25)的设置保持了圆不变成椭圆。
  • coord_equal()也可以完成这个目的,但是可能存在图例遮挡的情况。
  • r \(\in [1,10]\)比较合理,不然太大太小,这里可以设置变量

6 stat

统计变换即对数据进行统计变化, 通常以某种方式对数据信息进行汇总

6.1 stat_unique

ggplot(mtcars, aes(vs, am)) +
  geom_point(alpha = 0.1)
ggplot(mtcars, aes(vs, am)) +
  geom_point(alpha = 0.1, stat = "unique")

注意看颜色。 后面一个图stat = "unique"加入后,明显颜色一直很浅,是因为alpha = 0.1。 然而第一幅图左上角很深,是因为有很多重复值。

6.2 stat_summary

d <- ggplot(mtcars, aes(cyl, mpg)) + geom_point()
d + stat_summary(fun.data = "mean_cl_boot", colour = "red", size = 2)
d + stat_summary(fun.y = "median", colour = "red", size = 2, geom = "point")

6.3 stat_qq_line

# devtools::install_github("tidyverse/ggplot2")
library("ggplot2")
library("caret")
df <- data.frame(y = rt(200, df = 5))
p <- ggplot(df, aes(sample = y))
p + stat_qq() + stat_qq_line()

6.3.1 QQ Plot 查看分类变量下不同分布 [Kassambara2016,pp.59]

library(tidyverse)
p <- ggplot(mtcars, aes(sample=mpg))
# Basic plot
p + stat_qq()
# Change point shapes by groups
# Use custom color palettes
p + stat_qq(aes(shape = as.factor(cyl), color = as.factor(cyl)))+
scale_color_manual(values=c("#00AFBB", "#E7B800", "#FC4E07"))

7 facet

7.1 横纵的选择

library(tidyverse)
mtcars %>% 
  ggplot(aes(x =  wt, y = qsec)) + 
  geom_point() +
  facet_wrap(~ cyl, scales = 'free', ncol = 1)

ncol = 1用于facet_wrap,可以限定横纵的选择。

8 scale

8.1 理解方式

scales 衡量数据的映射方式,举例方便理解 (宏基因组 2018)

mpg %>%
  ggplot(aes(x = displ, y = hwy)) +
  geom_point(aes(col = class))
  # scale_x_continuous() +
  # scale_y_continuous() +
  # scale_color_discrete()

注意最后三个scale_*条件是默认的,一一解释,

  • scale_x_continuous(): 以连续变量展示,但是因为x本身是连续变量,因此转换为本身
  • scale_y_continuous(): 以连续变量展示,但是因为x本身是连续变量,因此转换为本身
  • scale_color_discrete(): col变量 class作为颜色,以离散变量展示

以下做一些变换,

mpg %>%
  ggplot(aes(x = displ, y = hwy)) +
  geom_point(aes(col = class)) +
  scale_x_sqrt(quote(a + M ^ e)) +
  scale_color_brewer()
  • scale_x_sqrt(): x按照\(\sqrt{x}\)的方式展示。
  • scale_color_brewer(): 颜色按照这一规则映射,scale_colour_brewer(..., type = "seq", palette = 1, direction = 1),参数type = "seq"表示按照连续变量处理,因此颜色数连续的蓝色。
  • quote(a + M ^ e)可以输入简易公式。

8.2 调整比例尺

labels = comma算一种。

diamonds %>% 
  ggplot(aes(x =  carat, y = price)) + 
  geom_point() + 
  scale_x_log10() +
  scale_y_log10() +
  coord_trans(y = scales::exp_trans(10)) +
  theme(legend.position = 'none')

scales::exp_trans(10)相当于对\(y\)的比例尺进行转换

\[y \to 10^y\]

8.2.1 给时间序列自定义标签

使用ggplot2给时间序列作图时,有自定义时间标签的需求,比如希望时间标签更细致,举例。

8.2.1.1 模拟建立数据

library(tidyverse)
library(lubridate)
data_table <- data_frame(a = seq(1:300), b = today() + a)
head(data_table)
data_table %>% 
  ggplot(aes(x = b, y = a)) +
    geom_point()

8.2.1.2 想看更加细致的月份数据

使用scale_x_date, 参考 * Time Series 05: Plot Time Series with ggplot2 in R – NEON Data Skills

data_table %>% 
  ggplot(aes(x = b, y = a)) +
    geom_point() +
    scale_x_date(date_breaks = '1 month')

8.2.1.3 limits看最近时间

scale_x_date(limits = c(as.Date('2018-07-27'), Sys.Date()))

8.2.1.4 建立更加整洁的标签

转置标签方向。标签,转置标签方向。

data_table %>% 
  ggplot(aes(x = b, y = a)) +
    geom_point() +
    scale_x_date(date_breaks = '1 month') +
    theme(axis.text.x = element_text(angle = 70, hjust = 1))

8.2.2 xlim中写<符号

mutate(bin = fct_reorder(paste0("<=",as.factor(as.numeric(bin3))),as.numeric(bin3))) %>%中 使用paste0函数。

8.2.3 自定义轴坐标的bin和命名

axs + scale_x_continuous(breaks = c(2000, 4000), labels = c("2k", "4k")

8.3 修改legend

scale_colour_discrete(name = "图例名称", breaks = c("图例名称1","图例名称2"), labels = c("图例新名称1","图例新名称1")) 

8.3.1 stack图打标签的方法

关键是position=position_stack()。 但是会改变y轴的刻度。

8.4 横线加标签

9 theme

9.1 最好看的集成方案

theme_ilo <- function(){
    theme_minimal() + 
    theme(
    # text = element_text(family = "Bookman", color = "gray25"),
    plot.subtitle = element_text(size = 9),
    plot.caption = element_text(color = "gray30"),
    # plot.background = element_rect(fill = "gray95"),
    plot.margin = unit(c(5, 10, 5, 10), units = "mm"),
    axis.title.x = element_text(size=12,face = "bold"),
    axis.title.y = element_text(size=12,face = "bold"),
    # x和y的命名,要加粗,ppt才好看
    axis.text.x = element_text(size=7, angle = 70, hjust = 1),
    # 控制axis字体大小,7号大小最好
    axis.text.y = element_text(size=7),
    legend.title=element_blank()
    )
}
theme_ilo() +
scale_fill_brewer(palette = "Greens") +
# 为了分层好看
scale_color_brewer(palette = "Greens") +
# 为了分层好看
theme(text=element_text(family="STKaiti"))

9.2 中文乱码解决方案

theme(text=element_text(family="STKaiti")) + 加入这样一行代码就好了,因为限定了text中, element_text的属性为family="STKaiti"这样就可以处理中文乱码的问题了。

Error in grid.Call(L_textBounds, as.graphicsAnnot(x$label), x$x, x$y, : Polygon edge not found
  1. 去font book 里面选择好字体,大多数原因是没有这个字体(Zdeb 2016; awags1 2018)

  2. 一般是RMarkdown 无法Preview,因此先保存图片,再knitr::include_*

    # 看来中文显示出现了问题。 # geom_text(family=“STKaiti”) + 另外指定 # theme(text=element_text(family=“STKaiti”)) 没有指定元素是中文。

参考 这篇CSDN博客

9.3 不显示legend

mtcars %>% 
  ggplot(aes(x =  wt, y = qsec, col = as.factor(cyl))) + 
  geom_point() + 
  theme(legend.position = 'none')

theme(legend.position = 'none')legend.position = 'none'使得不显示legend,因为有时候就是多余。

9.4 一个集成方案

mpg %>% 
  ggplot(aes(x = cty, y = hwy, col = as.factor(cyl))) +
  geom_jitter() +
  geom_abline(col = "grey50", size = 2) +
  # 添加labs
  labs(
    x = "City mileage/galleon",
    y = "Highway mileage/gallon",
    title = "Highway and city mileage are highed correlated",
    subtitle = "add some comments",
    color = "Change legend name",
    caption = "Data Source: mpg"
  ) +
  # 点的颜色真难看。
  scale_color_brewer(type = "seq", palette = "Spectral") +
  # adj. [物]光谱的;幽灵的;鬼怪的
  # /'spɛktrəl/
  # 下面开始修改theme,不能忍受背景是灰色的
  theme_minimal() +
  theme(
    plot.title = element_text(face = "bold", size = 12),
    # bold好看些
    plot.subtitle = element_text(size = 10)
  ) +
  theme(
    plot.title = element_text(hjust = 0.5),
    # Horizontal justification (in [0, 1])
    # 因此居中就是0.5
    plot.subtitle = element_text(hjust = 0.5),
    # Vertical justification (in [0, 1])
    aspect.ratio = 3/4,
    # 限定纵横比,放入ppt
  )

scale_color_brewer非常好的美化函数。

9.5 for loop with labs

for (i in unique(mtcars$cyl)){
  p1 <- 
  mtcars %>% 
    filter(cyl == i) %>% 
    ggplot(aes(x = mpg, y = disp)) +
    geom_point() +
    labs(
      title = paste("cyl = ", i)
    )
  ggsave(paste(i,"pic.png",sep = "_"), device = "png")
  print(p1)
}

ggsave(paste(i,"pic.png",sep = "_"), device = "png") 保存图片,方便复制粘贴。

Rodrigues (2017) 给出一个不用展示plot,但是输出的方法,很稳定,数据的处理都在nested table。

# devtools::install_github('amarjen/pwt9') 
# https://github.com/amarjen/pwt9
# data("pwt9")
pwt9.0 <- read_csv("https://raw.githubusercontent.com/amarjen/pwt9/master/data-raw/pwt90.csv")
country_list <- c("France", "Germany", "United States of America", "Luxembourg", "Switzerland", "Greece")
small_pwt <- pwt9.0 %>%
  filter(country %in% country_list)
small_pwt <- small_pwt %>%
  mutate(country = factor(country, levels = country_list, ordered = TRUE))
library(ggthemes)
plots <- small_pwt %>%
  group_by(country) %>%
  nest(.key = dat) %>%
  mutate(plot = map2(dat, country, ~ggplot(dat = .x) + theme_tufte() +
       geom_line(aes(y = avh, x = year)) +
       ggtitle(.y) +
       ylab("Year") +
       xlab("Average annual hours worked by persons engaged")))

# plots$plot
# map2(
#     file.path("test_dropout_folder",paste0(plots$country,".pdf")),
#     plots$plot, 
#     ggsave)

9.6 配色交互

CPCOLS <- c("#7A67EE", "#4EEE94", "#e31a1c")

ggplot(iris, aes(Sepal.Length, Petal.Length)) +
      geom_point(aes(col = Species)) +
      scale_colour_manual(values = CPCOLS)

9.7 交互调整theme

library(ggThemeAssist)
p1 <- 
mtcars %>% 
  ggplot(aes(x = mpg, y = disp, col = as.factor(cyl))) + 
  geom_point()
p1 + theme(plot.subtitle = element_text(vjust = 1), 
    plot.caption = element_text(colour = "darkblue", 
        vjust = 1), axis.title = element_text(colour = "darkblue"), 
    plot.title = element_text(face = "bold", 
        colour = "darkblue"), legend.position = "none") +labs(title = "mpg, disp 和 cyl之间的关系", 
    colour = "cyl", subtitle = "这是用ggplot Theme Assistant调整的,代码肯定很难看的,需要自己再删减一下的。", 
    caption = "数据来源: mtcars") + 
  theme_minimal() +
  theme(text=element_text(family="STKaiti"))

这个包是针对初学者的,实际上熟悉代码后,不需要这个的,太简单了。 除非是just这种非常小的调节差不多。

9.8 让图片的label加粗

例如geom_bar()中,加入fontface="bold"

9.9 quote

mtcars %>% 
  ggplot(aes(x = mpg, y = disp^2, col = cyl)) +
  geom_point() +
  labs(color = '') + # 不要legend的名字
  scale_y_continuous(quote(disp ^ two)) + # 平方体现
  theme(legend.position = "none")

ggplot 现在薄弱的环节是 ggthemetheme。 ggplot书上p.201。

9.10 How to annotate() ggplot with latex

latex2exp::TeX函数实现图中加latex,记得是\\号,目前不能解决斜体的问题(Gegzna 2016)

library(latex2exp)
library(ggplot2)
mtcars %>%
    ggplot(aes(x = mpg/disp,y=qsec)) +
    geom_point() +
    annotate("text", x = .4, y = 20,
      label = "paste(italic(R) ^ 2, \" = .75\")", parse = TRUE) +
    annotate("text", x = .2, y = 20,
      label = TeX("$\\alpha$"), parse = TRUE) +
    annotate("text", x = .1, y = 20,
      label = TeX("Formula: $\\frac{2hc^2}{\\lambda^\\beta}$"), parse = TRUE)

9.11 插入数学公式 (Slowikowski 2018)

d <- data.frame(
  x    = c(1, 2, 2, 1.75, 1.25),
  y    = c(1, 3, 1, 2.65, 1.25),
  math = c(
    NA,
    "integral(f(x) * dx, a, b)",
    NA,
    "lim(f(x), x %->% 0)",
    NA
  )
)
library(ggrepel)
ggplot(d, aes(x, y, label = math)) +
  geom_point() +
  geom_label_repel(
    parse       = TRUE, # Parse mathematical expressions.
    size        = 8,
    box.padding = 2
  )

9.12 annotate函数中的label参数表达绝对值符号

y == abs(x)

9.13 加入rectangle (Evans 2013)

annotate("rect", xmin=100, xmax=200, ymin=0, ymax=Inf, alpha=0.2, fill="red")

10 Combine

10.1 grid.arrange

library(gridExtra)
p1 <- mtcars %>% 
  ggplot(aes(x= mpg, y = disp)) + 
  geom_point()
p2 <- mtcars %>% 
  ggplot(aes(x= disp, y = mpg)) + 
  geom_point()
grid.arrange(p1,p2)

10.2 cowplot [Kassambara2016,pp.205-211]

library(cowplot)
library(graphics)
data(ToothGrowth)

p <- ggplot(ToothGrowth %>% mutate(dose = as.factor(dose)), aes(x = dose, y = len))
my3cols <- c("#E7B800", "#2E9FDF", "#FC4E07")

# Box plot (bp)
bxp <-
    p +
    geom_boxplot(aes(color = dose)) +
    scale_color_manual(values = my3cols)
bxp
# Dot plot (dp)
dp <-
    p +
    geom_dotplot(aes(color = dose, fill = dose)
                 ,binaxis='y', stackdir='center') +
    scale_color_manual(values = my3cols) +
    scale_fill_manual(values = my3cols)
dp
lp <-
    economics %>%
        ggplot(aes(x = date, y = psavert)) +
        geom_line(color = "#E46726")
lp
plot_grid(bxp, dp, lp
          ,labels = c("Figure 1", "Figure 2", "Figure 3")
          ,ncol = 2, nrow = 2)
plot2by2 <-
ggdraw() +
    draw_plot(bxp, x = 0, y = .5, width = .5, height = .5) +
    draw_plot(dp, x = .5, y = .5, width = .5, height = .5) +
    draw_plot(lp, x = 0, y = 0, width = 1, height = 0.5) +
    draw_plot_label(label = c("Figure 1", "Figure 2", "Figure 3")
                    ,x = c(0, 0.5, 0), y = c(1, 1, 0.5), size = 15)
plot2by2

使用cowplot::plot_grid函数进行合并。 draw_plot_label中的size描述label的大小。

如图,xy的位置是图形的左下角作为坐标,widthheight描述图片的形状。

save_plot(
    "plot2by2.png",
    plot2by2,
    ncol = 2, # we're saving a grid plot of 2 columns
    nrow = 2, # and 2 rows
    # each individual subplot should have an aspect ratio of 1.3
    base_aspect_ratio = 1.3
)

这样就可以保存pdf版本了,比较清晰。

# use save_plot() instead of ggsave() when using cowplot
save_plot(
    "mpg.png"
    ,bxp
    ,base_aspect_ratio = 1.3 # make room for figure legend
)

11 排序

11.1 因子变量排序

因此变量排序,不应该110放在一起,解决办法是,

fct_reorder(bin2, as.numeric(bin2)) 对于连续变量, 首先先转为fct,然后用as.numeric来排序。


12 for loop

12.1 循环跑ggplot

循环跑ggplot,其实就是套用for loop的参数就可以了。

for (i in ....) { 

}

13 实现相关矩阵

library(tidyverse)
library(reshape2)
library(ggthemes)
cor_list <- function(x) {
  L <- M <- cor(x)
  
  M[lower.tri(M, diag = TRUE)] <- NA
  M <- melt(M)
  names(M)[3] <- "points"
  # lower.tri就是i比j大,而已。
  L[upper.tri(L, diag = TRUE)] <- NA
  L <- melt(L)
  names(L)[3] <- "labels"
  
  merge(M, L)
}

cor_list(iris[1:4])
# 这里的缺失值有三种
# 1. cor对角线上的
# 2. upper.tri中的一半
# 3. lower.tri中的一半


iris1 <- 
iris %>%
  group_by(Species) %>%
  do(cor_list(.[1:4])) 
# 这里相当于unnest了,比map函数方便。
iris1 %>% 
  ggplot(aes(x = Var1, y = Var2)) +
    geom_point(aes(col = labels, 
                   size = abs(labels)), shape = 16) +
  geom_text(aes(x = Var2, y = Var1, 
                # 这里要交叉一下,
                # 这样文字就在下三角了。
                col = points,
                # size = abs(points),
                # size 不可以加,不然看不见
                # hjust = 2,
                label = round(labels, 2))) +
  scale_size(range = c(0, 6)) +
  # 控制点的大小
  scale_color_gradient2("r", limits = c(-1, 1)) +
  scale_y_discrete("", limits = rev(levels(iris1$Var1))) +
  # rev控制了factor反着走,这样可以控制图像在上三角还是下三角
  scale_x_discrete("") +
  guides(size = FALSE) +
  # 没什么用
  geom_abline(slope = -1, intercept = nlevels(iris1$Var1) + 1) +
  coord_fixed() +
  facet_grid(. ~ Species) +
  # 不然图像重合了很难看。
  labs(
    caption = "数据来源:iris",
    subtitle = "建立相关性矩阵很简单\n抓好x和y轴变量和计算的相关系数就好",
    title = "ggplot实现相关矩阵"
  ) +
  theme_tufte() +
  theme(text = element_text(family = "STKaiti")) +
  # 为了显示中文
  theme(axis.text.y = element_text(angle = 45, hjust = 1),
        axis.text.x = element_text(angle = 45, hjust = 1),
        strip.background = element_blank())

14 颜色盘

  • scale_fill_manual() for box plot, bar plot, violin plot, etc

  • scale_color_manual() for lines and points

    # Box plot bp + scale_fill_manual(values=c(“#999999”, “#E69F00”, “#56B4E9”)) # Scatter plot sp + scale_color_manual(values=c(“#999999”, “#E69F00”, “#56B4E9”))

颜色代码(Chang 2018; 常玉俊 2017)

颜色代码(Chang 2018; 常玉俊 2017)

以下以#3366FF作为例子,进行尝试。

mtcars %>% 
    ggplot() + 
    geom_point(aes(x = mpg, y = cyl),col = '#3366FF')

15 Calendar Plot (戴启立 2018)

library(openair)
# load example data from package
data(mydata)

# basic plot
calendarPlot(mydata, pollutant = "o3", year = 2003)

16 展示最优的方式

library(data.table)
best_score <- fread("Product,score1,score2,score3,score4
A,187.58,4.78,4.5,64.06
B,262,2.38,6,11.88
C,298.65,2.91,6,2.88
D,274.76,4.47,6,-6.42
E,293.48,2,5,61.12
")
market_level <- fread("MaxAve,Statistic,Aggregate
Maximum,score1,362.54
Average,score1,222.05
Maximum,score2,5.99
Average,score2,2.06
Maximum,score3,6
Average,score3,4.43
Maximum,score4,110.34
Average,score4,-7.72
")

这里注意,market_level表的设计保证了可以插入多条横线,带group就好。 (Perry 2018,Task 9)

best_score %>% 
    gather(Statistic,Aggregate,-Product) %>% 
    ggplot(aes(x=Product,y=Aggregate,fill=Product))+
        geom_bar(stat="identity") + 
    facet_wrap (~Statistic, nrow = 2, ncol = 2, scales="free_y") +
    geom_hline(data=market_level, aes(yintercept=Aggregate, group=Statistic, color=MaxAve), size=1) +
    labs(
        title="Best five products"
        ,color="Top or bottom"
        ,caption="Jiaxiang Li - jiaxiangli.netlify.com") +
    scale_fill_hue(l=70) +
    scale_color_hue(l=20) +
    theme_minimal() +
    theme(axis.text.x=element_blank(), axis.title.x=element_blank(), axis.title.y=element_blank())

17 Web-friendly plot

mtcars_p <- 
    mtcars %>% 
    ggplot(aes(mpg)) +
    geom_histogram()
mtcars_p %>% 
    plotly::ggplotly()

plotly::ggplotly可以提供交互的ggplot图 (McVey 2018 * Make a plot web-friendly | R)

18 制作博客背景图

受到 Swanson (2015) 的启发 和 Chisato (2018) 的R代码实现,为博客做了背景图。 相关的R包见 JiaxiangBU/add2blog

matrix_name <- 
    function(x,n=10) {
        set.seed(123)
        x %>% 
        str_to_upper() %>% 
        str_split('') %>% 
        .[[1]] %>% 
        sample(size=n*n,replace=T) %>% 
        matrix(
            nrow = 10
            ,byrow = T
        ) 
    }
  1. str_split切分字段参考 Stack Overflow
library(lubridate)
library(tidyverse)
library(data.table)
graph_fun <- function(matrix){
    matrix %>% 
        as.data.frame %>% 
        rownames_to_column('y') %>% 
        gather(x,text,-y) %>% 
        mutate_at(vars(x,y),as.integer) %>% 
        ggplot(aes(x=x, y=y, color=factor(text))) +
        geom_text(aes(label=text, color=factor(text)), family="Helvetica", size=2.5) +
        scale_y_reverse() +
        theme_void() +
        scale_color_viridis_d(option="magma", begin=0.2, guide="none") +
        scale_fill_viridis_d(option="magma", begin=0.2, guide="none") +
        theme(panel.background = element_rect(fill="#000000")) 
}
lixiaowu_mid_data <- 
    rbind(
        cbind(matrix_name('wangxiaoer'),matrix_name('wangxiaoer'),matrix_name('wangxiaoer'))
        ,cbind(matrix_name('wangxiaoer'),matrix_name('lixiaowu'),matrix_name('wangxiaoer'))
        ,cbind(matrix_name('wangxiaoer'),matrix_name('wangxiaoer'),matrix_name('wangxiaoer'))
    )
lixiaowu_mid_data %>% 
    `colnames<-`(1:30) %>% 
    `rownames<-`(1:30) %>% 
    graph_fun()
wangxiaoer_mid_data <- 
    rbind(
        cbind(matrix_name('lixiaowu'),matrix_name('lixiaowu'),matrix_name('lixiaowu'))
        ,cbind(matrix_name('lixiaowu'),matrix_name('wangxiaoer'),matrix_name('lixiaowu'))
        ,cbind(matrix_name('lixiaowu'),matrix_name('lixiaowu'),matrix_name('lixiaowu'))
    )
wangxiaoer_mid_data %>% 
    `colnames<-`(1:30) %>% 
    `rownames<-`(1:30) %>% 
    graph_fun()
ggsave(filename=
           paste0(
               now() %>% ymd_hms() %>% str_remove_all(' |-|[A-z]|:')
               ,'_blog_wechat_wallpic.png'
               
           )
           ,width=11*golden_ratio,height=11)
ggsave(filename=
           paste0(
               now() %>% ymd_hms() %>% str_remove_all(' |-|[A-z]|:')
               ,'_blog_wechat_wallpic.png'
               
           )
           ,width=11*golden_ratio,height=11)

参考

Chang, Winston. 2018. “Colors (Ggplot2).” 2018. http://www.cookbook-r.com/Graphs/Colors_%28ggplot2%29/.

Chisato. 2018. “VISUAL Art with Pi Using Ggplot2 & Circlize.” 2018. https://chichacha.netlify.com/2018/10/13/visual-art-with-pi-using-ggplot2-circlize/.

Dancho, Matt, and Davis Vaughan. 2018. Tidyquant: Tidy Quantitative Financial Analysis. https://CRAN.R-project.org/package=tidyquant.

Evans, Sam. 2013. “Reducing Dimensionality with Principal Component Analysis.” Stack Overflow. 2013. https://stackoverflow.com/questions/17521438/geom-rect-and-alpha-does-this-work-with-hard-coded-values.

Gegzna, Vilmantas. 2016. “How to Annotate() Ggplot with Latex.” 2016. https://stackoverflow.com/questions/12514612/how-to-annotate-ggplot-with-latex.

McVey, Elaine. 2018. “Building Dashboards with Flexdashboard.” DataCamp. 2018. https://www.datacamp.com/courses/building-dashboards-with-flexdashboard.

Perry, George. 2018. “Scout Your Athletics Fantasy Team.” datacamp. 2018. https://www.datacamp.com/projects/177.

Rodrigues, Bruno. 2017. “Make Ggplot2 Purrr.” 2017. http://www.brodrigues.co/blog/2017-03-29-make-ggplot2-purrr/.

Slowikowski, Kamil. 2018. “Ggrepel Examples.” 2018. https://cran.r-project.org/web/packages/ggrepel/vignettes/ggrepel.html.

安建才. 2018. “入门 Ggplot2 的图形语法.” 2018. https://mp.weixin.qq.com/s/1nhh1g5FjvKci9IkidBGvw.

宏基因组. 2018. “Ggplot2笔记6:标度、轴和图例.” 2018. https://mp.weixin.qq.com/s/Za0Eoc5eN_N-VhjhI47Pgg.

常玉俊. 2017. “Ggplot2颜色设置.” 2017. https://blog.csdn.net/chang349276/article/details/77476848.

戴启立. 2018. “R语言在大气污染数据分析中的应用-时间序列分析(一).” 2018. https://mp.weixin.qq.com/s/AYEiJDbSOTd3ou17IDhRCw.

杜雨. 2018. “Ggplot 构造连环饼图.” 2018. https://mp.weixin.qq.com/s/U5RWvCxGvnBMktyzEsipcQ.