ggpubr提高作图效率 。 这个包R的作图省略了好多步骤,类似于爬虫的postman。
实际上,我们主要做统计分析,画太多时间画图,很不情愿。
## Error in `library()`:
## ! there is no package called 'ggpubr'
## Error in `library()`:
## ! there is no package called 'tidyverse'
图+均线
mtcars %>%
gather(key,value,everything()) %>%
group_by(key) %>%
summarise(n_distinct(value)) %>%
arrange(desc(`n_distinct(value)`))## Error in `` mtcars %>% gather(key, value, everything()) %>% group_by(key) %>% summarise(
## n_distinct(value)) %>% arrange(desc(`n_distinct(value)`)) ``:
## ! could not find function "%>%"
## Error in `mtcars %>% mutate(am = as.factor(am))`:
## ! could not find function "%>%"
mtcars1 %>%
ggdensity(x = "qsec", # 注意变量str化
col = "am",
add = "mean", # 加均线
rug = TRUE, # rug | logical value. If TRUE, add marginal rug.
fill = "am", # 这是做频率图我经常忘了,为了好看。
palette = c("#00AFBB", "#E7B800")) # 这个我真的不记得。## Error in `mtcars1 %>% ggdensity(x = "qsec", col = "am", add = "mean", rug = TRUE, fill = "am",
## palette = c("#00AFBB", "#E7B800"))`:
## ! could not find function "%>%"
mtcars1 %>%
gghistogram(x = "qsec", # 注意变量str化
col = "am",
add = "mean", # 加均线
rug = TRUE, # rug | logical value. If TRUE, add marginal rug.
fill = "am", # 这是做频率图我经常忘了,为了好看。
palette = c("#00AFBB", "#E7B800")) # 这个我真的不记得。## Error in `mtcars1 %>% gghistogram(x = "qsec", col = "am", add = "mean", rug = TRUE, fill = "am",
## palette = c("#00AFBB", "#E7B800"))`:
## ! could not find function "%>%"
注意变量str化,因此要因子化提前mutate,
这个图唯一好的,就是可以加均线而已,
rug就是图下的竖线,不明白。
palette = c("#00AFBB", "#E7B800"))两种对比,推荐的配色。
ggdensity和gghistogram取决于离散程度。
ECDF
mtcars1 %>%
ggplot(aes(qsec)) +
stat_ecdf(aes(color = am,linetype = am),
geom = "step", size = 1.5) +
scale_color_manual(values = c("#00AFBB", "#E7B800"))+
labs(y = "f(weight)")## Error in `mtcars1 %>% ggplot(aes(qsec))`:
## ! could not find function "%>%"
不好看。
箱线图、小提琴图和p值
在前面的eda解释中,箱线图主要用于一个分类和一个连续变量的分析。
## Error in `mtcars1 %>% mutate(cyl = as.factor(cyl))`:
## ! could not find function "%>%"
mtcars2 %>%
ggboxplot(
x = "cyl",
y = "qsec",
col = "cyl",
palette = c("#00AFBB", "#E7B800", "#FC4E07"),
add = "jitter") +
stat_compare_means(
comparisons =
list(
c("4", "6"),
c("4", "8"),
c("6", "8")
))## Error in `mtcars2 %>% ggboxplot(x = "cyl", y = "qsec", col = "cyl", palette = c("#00AFBB",
## "#E7B800", "#FC4E07"), add = "jitter")`:
## ! could not find function "%>%"
经过之前重复抽样的理解,这里对样本间均值差异的p值,应该有很深的理解了,因此,这里还可以比较,这样的话,直接给p值,这样不会让人看图很confuse。
stat_compare_means完成这一目标,直接放入对比pairs就好。
palette = c("#00AFBB", "#E7B800", "#FC4E07")三个比较的配色。
mtcars2 %>%
ggviolin(
x = "cyl",
y = "qsec",
fill = "cyl",
palette = c("#00AFBB", "#E7B800", "#FC4E07"),
add = "boxplot",
add.params = list(fill="white")) +
stat_compare_means(
comparisons =
list(
c("4", "6"),
c("4", "8"),
c("6", "8")
),
label = "p.signif",
label.y = 50
)## Error in `mtcars2 %>% ggviolin(x = "cyl", y = "qsec", fill = "cyl", palette = c("#00AFBB",
## "#E7B800", "#FC4E07"), add = "boxplot", add.params = list(fill = "white"))`:
## ! could not find function "%>%"
这里修改add,换成内置箱形图。
add.params改变箱形图的颜色。
label = "p.signif",我也觉得数字很烦,这个比较简洁。其中ns: p > 0.05。
ns: p > 0.05*: p <= 0.05**: p <= 0.01***: p <= 0.001****: p <= 0.0001
label.y = 50定位p值图中显示位置。
条形图+不分组排序
这个我一直在ggplot里面没有实现好。 可以借鉴一下。
for (i in c(TRUE,FALSE)){
print(
mtcars %>%
mutate(cyl = as.factor(cyl),
) %>%
rownames_to_column(var = "name") %>%
ggbarplot(x="name",
y="mpg",
fill = "cyl",
color = "white",
palette = "jco",#杂志jco的配色
sort.val = "desc",#下降排序
sort.by.groups=i,#不按组排序
x.text.angle=60)
)
}## Error in `mtcars %>% mutate(cyl = as.factor(cyl), ) %>% rownames_to_column(var = "name") %>%
## ggbarplot(x = "name", y = "mpg", fill = "cyl", color = "white", palette = "jco",
## sort.val = "desc", sort.by.groups = i, x.text.angle = 60)`:
## ! could not find function "%>%"
我之前一直没有很容易的放在一张图比较。
x.text.angle=60这个更加简单了。 theme(axis.text.x = element_text(angle = 70, hjust = 1))
比较复杂。
偏差图
这些美化,都是我需要提前做好的。
mtcars %>%
rownames_to_column(var = "name") %>%
mutate(cyl = as.factor(cyl),
mpg_z = (mpg-mean(mpg))/sd(mpg),
mpg_grp = case_when(
mpg_z<0 ~ "low",
TRUE ~ "high"
),
mpg_grp = as.factor(mpg_grp),
mpg_grp = fct_relevel(mpg_grp,"low", "high")
) %>%
ggbarplot(
x="name",
y="mpg_z",
fill = "mpg_grp",
color = "white",
palette = "jco",
sort.val = "asc",
sort.by.groups = FALSE,
x.text.angle=60,
ylab = "MPG z-score",
xlab = FALSE,
legend.title="MPG Group"
)## Error in `mtcars %>% rownames_to_column(var = "name") %>% mutate(cyl = as.factor(cyl),
## mpg_z = (mpg - mean(mpg)) / sd(mpg), mpg_grp = case_when(mpg_z < 0 ~ "low",
## TRUE ~ "high"), mpg_grp = as.factor(mpg_grp), mpg_grp = fct_relevel(mpg_grp,
## "low", "high")) %>% ggbarplot(x = "name", y = "mpg_z", fill = "mpg_grp",
## color = "white", palette = "jco", sort.val = "asc", sort.by.groups = FALSE,
## x.text.angle = 60, ylab = "MPG z-score", xlab = FALSE, legend.title = "MPG Group")`:
## ! could not find function "%>%"
这里其实使用了标准化处理$\frac{x-\mu}{\sigma}$
mpg_z = (mpg-mean(mpg))/sd(mpg),
mpg_grp = case_when(
mpg_z<0 ~ "low",
TRUE ~ "high"
),
mpg_z = fct_relevel(mpg_z,"low", "high")
棒棒糖图(Lollipop chart)
这个之前当初我搞了很久了。 终于出了简单的方法了。
for (i in c(TRUE,FALSE)) {
mtcars %>%
rownames_to_column(var = "name") %>%
mutate(cyl = as.factor(cyl)) %>%
ggdotchart(
x="name",
y="mpg",
color = "cyl",
palette = c("#00AFBB", "#E7B800", "#FC4E07"),
sorting = "ascending",
add = "segments",
rotate=i,
ggtheme = theme_minimal()
)
}## Error in `mtcars %>% rownames_to_column(var = "name") %>% mutate(cyl = as.factor(cyl)) %>%
## ggdotchart(x = "name", y = "mpg", color = "cyl", palette = c("#00AFBB",
## "#E7B800", "#FC4E07"), sorting = "ascending", add = "segments", rotate = i,
## ggtheme = theme_minimal())`:
## ! could not find function "%>%"
rotate决定是否旋转,比ggplot简单多了,coord_flip。
mtcars %>%
rownames_to_column(var = "name") %>%
mutate(cyl = as.factor(cyl)) %>%
ggdotchart(
x="name",
y="mpg",
color = "cyl",
palette = c("#00AFBB", "#E7B800", "#FC4E07"),
sorting = "ascending",
add = "segments",
rotate=TRUE,
group = "cyl",
dot.size = 6,
label = round(mtcars$mpg),
font.label = list(color="white", size=9, vjust=0.5),
ggtheme = theme_minimal()
)## Error in `mtcars %>% rownames_to_column(var = "name") %>% mutate(cyl = as.factor(cyl)) %>%
## ggdotchart(x = "name", y = "mpg", color = "cyl", palette = c("#00AFBB",
## "#E7B800", "#FC4E07"), sorting = "ascending", add = "segments", rotate = TRUE,
## group = "cyl", dot.size = 6, label = round(mtcars$mpg), font.label = list(
## color = "white", size = 9, vjust = 0.5), ggtheme = theme_minimal())`:
## ! could not find function "%>%"
label = round(mtcars$mpg),这是目前这个包的一个bug,必须这么做,但是呢,已经非常出色了。