8 min read

"学习笔记:Communicating with Data in the Tidyverse 学习笔记

"学习笔记 系列导航

1 "学习笔记:Deep Learning in Python 学习笔记 2017-12-22
2 "学习笔记:Python 学习的流水笔记 2017-12-25
3 "学习笔记:Network Analysis in Python Part 1 学习笔记 2017-12-27
4 "学习笔记:XGBoost using Python 学习笔记 2017-12-28
5 "学习笔记:Supervised Learning with scikit-learn 学习笔记 2017-12-30
6 "学习笔记:Boosting理论部分 学习笔记 2018-01-02
7 "学习笔记:Machine Learning with the Experts School Budgets 学习笔记 2018-01-02
8 "学习笔记:犯罪心理解析 2018-01-02
9 "学习笔记:决策树理论部分 学习笔记 2018-01-03
10 "学习笔记:Shell 学习笔记 2018-01-04
11 "学习笔记:客户价值定价 学习笔记 2018-01-04
12 "学习笔记:Introduction to Git for Data Science 学习笔记 2018-01-06
13 "学习笔记:线性代数 整理笔记 2018-01-08
14 "学习笔记:退火算法 学习笔记 2018-01-09
15 "学习笔记:Fahrenheit 911 视频笔记 2018-01-18
16 "学习笔记:pandas debugging 学习笔记 2018-01-19
17 "学习笔记:brilliant.org概率论导论 学习笔记 2018-01-22
18 "学习笔记:Machine Learning with Tree-Based Models in R 学习笔记 2018-01-22
19 "学习笔记:Building Web Applications in R with Shiny 学习笔记 2018-01-25
20 "学习笔记:Inference for Numerical Data 学习笔记 2018-01-26
21 "学习笔记:Support Vector Machines SVM 学习笔记 2018-01-26
22 "学习笔记:Introduction to DataCamp Projects 学习笔记 2018-01-28
23 "学习笔记:Working with Web Data in R 学习笔记 2018-01-28
24 "学习笔记:三种平均数使用的方式 学习笔记 2018-01-29
25 "学习笔记:戒律的复活 每周六更新 2018-01-29
26 "学习笔记:Kaggle R Tutorial on Machine Learning 学习笔记 2018-02-01
27 "学习笔记:Kaggle Python Tutorial on Machine Learning 学习笔记 2018-02-02
28 "学习笔记:圆桌派 第三季 视频笔记 2018-02-05
29 "学习笔记:基础与技巧整理 2018-02-25
30 "学习笔记:英语学习:积累:词汇、表达与语法整理 2018-04-09
31 "学习笔记:魏剑峰英语学习:笔记:表达与语法整理 2018-05-02
32 "学习笔记:Planet Money播客学习笔记:经济学话题解析 2018-06-05
33 "学习笔记:WSJ 学习笔记 2020-10-19

{r setup, include=FALSE} knitr::opts_chunk$set(eval = FALSE) ### Communicating with Data in the Tidyverse 居然讲到了css,果断follow啊,还可以复习ggplot2。

  • 4 hours
  • 15 Videos
  • 53 Exercises
  • 265 Participants
  • 4,350 XP

这种人学习少的,肯定有价值啊。

Timo Grossenbacher 这哥们是个记者?这敢情好,肯定sense好,画图666。

{r message=FALSE, warning=FALSE, cache=TRUE, include=FALSE} library(tidyverse) library(knitr) download.file( "https://assets.datacamp.com/production/course_5807/datasets/ilo_hourly_compensation.RData", "ilo_hourly_compensation.RData ) download.file( "https://assets.datacamp.com/production/course_5807/datasets/ilo_working_hours.RData", "ilo_working_hours.RData )

{r message=FALSE, warning=FALSE, include=FALSE} load("ilo_hourly_compensation.RData") load("ilo_working_hours.RData")

{r message=FALSE, warning=FALSE, include=FALSE} library(tidyverse) ilo_data <- ilo_hourly_compensation %>% inner_join(ilo_working_hours, by = c("country", "year")) %>% mutate(year = as.factor(as.numeric(year))) %>% mutate(country = as.factor(country)) # filter(country %in% european_countries) %>% plot_data <- ilo_data %>% filter(year == "2006")

Add labels to the plot | R

{r} # Create the plot ilo_plot <- ggplot(plot_data) + geom_point(aes(x = working_hours, y = hourly_compensation)) + # Add labels labs( x = "Working hours per week", y = "Hourly compensation", subtitle = "The more people work, the less compensation they seem to receive", title = "Working hours and hourly compensation in European countries, 2006", caption = "Data source: ILO, 2017 ) ilo_plot

caption位于右下角,作为数据来源说明。 subtitle表达了一定的观点。

Custom ggplot2 themes | R

default比较丑。 视频打不开。

Apply a default theme | R

{r} ilo_plot + theme_minimal()

比原图好看。 theme_minimal()。 更多可以看这里。 比如theme_wsj是华尔街日报的风格。

{r} library(ggthemes) ilo_plot + theme_wsj()

See how quickly you can change the overall appearance of a ggplot2 plot?

Change the appearance of titles | R

{r} ilo_plot + theme_minimal() + # Customize the "minimal" theme with another custom "theme" call theme( # text = element_text(family = "Bookman"), title = element_text(color = "gray25"), # 字体灰色一点 plot.subtitle = element_text(size = 12), # 大一点可以看得见 plot.caption = element_text(color = "gray30") # 字体灰色一点 )

感觉还是没有labs的功能强大。哈哈哈。

Alter background color and add margins | R

注意这个theme可以重复用,不影响。 plot.margin = unit(c(5, 10, 5, 10), units = "mm")告诉具体的单位。

```{r} ilo_plot + # “theme” calls can be stacked upon each other, so this is already the third call of “theme theme( plot.background = element_rect(fill =“gray95”), plot.margin = unit(c(5, 10, 5, 10), units = “mm”) )

背景改成了灰色`"gray95"`,好难看。

> Now your plot really stands out from the rest.

### [Visualizing aspects of data with facets | R](https://campus.datacamp.com/courses/communicating-with-data-in-the-tidyverse/creating-a-custom-and-unique-visualization?ex=1)

dotplot.

针对`facet_grid`,
可以用
`strip.background`和
`strip.text`。

__Defining your own theme function__

```
theme_green <- function(){
  theme(
    plot.background = 
      element_rect(fill = "green"),
    panel.background = 
      element_rect(fill = 
        "lightgreen")
  )
}

之前plot.background已经修改过了,这里我们修改下panel.background

{r} ilo_plot + theme_green()

{r} # Filter ilo_data to retain the years 1996 and 1996 ilo_data1 <- ilo_data %>% filter(year %in% c(1996,2006)) ilo_plot1 <- ilo_data1 %>% ggplot(aes(x = working_hours, y = hourly_compensation)) + geom_point() + labs( x = "Working hours per week", y = "Hourly compensation", title = "The more people work, the less compensation they seem to receive", subtitle = "Working hours and hourly compensation in European countries, 2006", caption = "Data source: ILO, 2017 ) + # Add facets here facet_grid(facets = . ~ year) # facets 可以省略 ilo_plot1

丑因为是default的,所有的好看都是从labs开始, 然后在themetheme_*()开始。

Define your own theme function | R

```{r} # Define your own theme function below theme_ilo <- function(){ theme( # text = element_text(family = “Bookman”, color = “gray25”), plot.subtitle = element_text(size = 12), plot.caption = element_text(color = “gray30”), plot.background = element_rect(fill = “gray95”), plot.margin = unit(c(5, 10, 5, 10), units = “mm”) ) }

For a starter, let’s look at what you did before: adding various theme calls to your plot object

ilo_plot + theme_minimal() + theme_ilo()

总结就五个东西,

`text`,`plot.subtitle`,`plot.caption`: `family`字体,`col`颜色,`size`大小,
通过`element_text`构建。
这个是修改背景版本和页边距
`plot.background = element_rect(fill = "gray95"),`。
`plot.margin = unit(c(5, 10, 5, 10), units = "mm")`。

```
# Apply your theme function
ilo_plot1 + 
  theme_ilo()

# Examine ilo_plot
ilo_plot1

ilo_plot1 +
  # Add another theme call
  theme(
    # Change the background fill to make it a bit darker
    strip.background = element_rect(fill = "gray60", color = "gray95"),
  ) +
  theme(
    # Make text a bit bigger and change its color to white
    strip.text = element_text(size = 11, color = "white")
  )
  • strip.background修改level上的背景颜色。
  • strip.text修改level上的字的颜色。

A custom plot to emphasize change | R

这个dot plot,不是我立即那个,其实是棒棒糖啊。

我可以用这个作为模型比较的表现,秀一波。

ggplot() +
    geom_path(aes(x = numeric_variable, y = numeric_variable))
ggplot() +
  geom_path(aes(x = numeric_variable, y = factor_variable))
ggplot() +
  geom_path(aes(x = numeric_variable, y = factor_variable),
              arrow = arrow(___))

开始搞geom_path。 但是心里有数,x必须是连续变量,比如$R^2$。

A basic dot plot | R

{r} # Create the dot plot ilo_data %>% filter(year %in% c(1996,2006)) %>% ggplot() + geom_path(aes(x = working_hours, y = country))

别感到奇怪,先要知道为什么这样,看看数据结构就知道了。

{r} ilo_data %>% filter(year %in% c(1996,2006)) %>% arrange(country) %>% head()

所以啊,每个国家都要有一个最大值和最小值。 但是判断不了方向,也就是说你不知道随着时间变了,到底是增加了还是减少了。

Add arrows to the lines in the plot | R

{r} ilo_data %>% filter(year %in% c(1996,2006)) %>% ggplot() + geom_path(aes(x = working_hours, y = country), # Add an arrow to each path arrow = arrow(length = unit(1.5, "mm"), type = "closed"))

现在总算知道是减小的趋势了吧。 但是没有具体的数字没有意义,好累,所以还是要给出数字。

Add some labels to each country | R

这里通过geom_text()geom_label()加入数字,但是后者有背景,按需来。

{r} ilo_data %>% filter(year %in% c(1996,2006)) %>% ggplot() + geom_path(aes(x = working_hours, y = country), arrow = arrow(length = unit(1.5, "mm"), type = "closed")) + # Add a geom_text() geometry geom_text( aes(x = working_hours, y = country, label = round(working_hours, 1)) )

但是有点重合,难受。

Polishing the dot plot | R

forcats::fct_rev这个很厉害了!简单。 理解下高配版的fct_reorderfct_reorder(country, working_hours, mean))根据,

group_by(country) %>% 
summarise(mean(working_hours))

来进行fct_reorder哈哈。

hjustvjust竟然可以这样!一定要搞懂。

ggplot(ilo_data) +
  geom_path(aes(...)) +
  geom_text(
        aes(...,
            hjust = ifelse(year == "2006", 
              1.4, 
              -0.4)
        )
    )

Reordering elements in the plot | R

```{r} library(forcats) ilo_data %>% filter(year %in% c(1996,2006)) %>% # Arrange data frame arrange(country) %>% # Reorder countries by working hours in 2006 mutate(country = fct_reorder(country, working_hours, last )) %>%

Plot again

ggplot() + geom_path(aes(x = working_hours, y = country), arrow = arrow(length = unit(1.5, “mm”), type = “closed”)) + geom_text( aes(x = working_hours, y = country, label = round(working_hours, 1)) )

只不过又一个递增的趋势,根据2006年的`working_hours`来计算。

### [Correct ugly label positions | R](https://campus.datacamp.com/courses/communicating-with-data-in-the-tidyverse/creating-a-custom-and-unique-visualization?ex=13)

```
# Save plot into an object for reuse
ilo_data %>% 
  filter(year %in% c(1996,2006)) %>% 
  # Arrange data frame
  arrange(country) %>%
  # Reorder countries by working hours in 2006
  mutate(country = fct_reorder(country,
                               working_hours,
                               last
                               )) %>% 
  ggplot() +
  geom_path(aes(x = working_hours, y = country),
            arrow = arrow(length = unit(1.5, "mm"), type = "closed")) +
    # Specify the hjust aesthetic with a conditional value
  geom_text(
        aes(x = working_hours,
            y = country,
            label = round(working_hours, 1),
            hjust = ifelse(year == "2006", 1.4, -0.4)
          ),
        # Change the appearance of the text
        size = 3,
        # family = "Bookman",
        col = "gray25
        ) 

hjust只平行移动, year == "2006"向右1.6year != "2006"向左0.5

但是有些字卡到边距上了。

Finalizing the plot for different audiences and devices | R

coord_cartesian vs. xlim / ylim

ggplot_object +
    coord_cartesian(xlim = c(0, 100), ylim = c(10, 20))
ggplot_object +
  xlim(0, 100) +
  ylim(10, 20)

这是两者的区别,所以就是是否删除数据,因此推荐用前者。

因此只需要加入coord_cartesian, 其中xlim = c(19, 41)多一点点即可。

{r} # Save plot into an object for reuse ilo_plot2 <- ilo_data %>% filter(year %in% c(1996,2006)) %>% # Arrange data frame arrange(country) %>% # Reorder countries by working hours in 2006 mutate(country = fct_reorder(country, working_hours, last )) %>% ggplot() + geom_path(aes(x = working_hours, y = country), arrow = arrow(length = unit(1.5, "mm"), type = "closed")) + # Specify the hjust aesthetic with a conditional value labs( x = "Working hours per week", y = "Hourly compensation", subtitle = "The more people work, the less compensation they seem to receive", title = "Working hours and hourly compensation in European countries, 2006", caption = "Data source: ILO, 2017 ) + geom_text( aes(x = working_hours, y = country, label = round(working_hours, 1), hjust = ifelse(year == "2006", 1.4, -0.4) ), # Change the appearance of the text size = 3, # family = "Bookman", col = "gray25 ) + coord_cartesian(xlim = c(25,41)) ilo_plot2

Desktop vs. Mobile audiences 这都考虑到了,真是厉害。还分桌面版和移动版(narrow and tall)。

Optimizing the plot for mobile devices | R

```{r fig.height=8, fig.width=4.5,fig.align=‘center’} # Compute temporary data set for optimal label placement median_working_hours <- ilo_data %>% filter(year %in% c(1996,2006)) %>% # Arrange data frame arrange(country) %>% # Reorder countries by working hours in 2006 mutate(country = fct_reorder(country, working_hours, last )) %>% group_by(country) %>% summarize(median_working_hours_per_country = median(working_hours)) %>% ungroup()

Have a look at the structure of this data set

str(median_working_hours)

ilo_plot2 + # Add label for country geom_text(data = median_working_hours, aes(y = country, x = median_working_hours_per_country, label = country), vjust = 2, # family = “Bookman”, color = “gray25”) + # Remove axes and grids theme( axis.ticks = element_blank(), axis.title = element_blank(), axis.text = element_blank(), panel.grid = element_blank(), # Also, let’s reduce the font size of the subtitle plot.subtitle = element_text(size = 3) )

主要是把国家加到横线附近的sense还是有的。
注意这里
`{r fig.height=8, fig.width=4.5,fig.align='center'}`这里使得图片满足了手机格式。高宽比=$8:4.5$且居中。

<!-- 手机这个地方还不太会,算了,之后再搞。 -->

总结一下,现在就是学会了
`labs`、`theme`的各种参数,
好看的模版`theme_*`,
一些debug的技能。
已经不错了,算是进步了。

[HTML manual by RStudio](http://rmarkdown.rstudio.com/html_document_format.htm)
这个很有用好好学。

![](../../../picbackup/yaml.png?imageView2/2/w/600)

在`yaml`抬头加入

output: html_document: theme: united highlight: monochrome

### [Add a table of contents | R](https://campus.datacamp.com/courses/communicating-with-data-in-the-tidyverse/customizing-your-rmarkdown-report?ex=3)

`toc: true`中,
`toc`指的是
__table of contents__,就是目录。
`toc_float`设定了是否跟随翻阅页面时,目录跟着移动。
`toc_depth`决定了目录的层级。

![](../../../picbackup/yaml_toc_floated.png?imageView2/2/w/600)

这里暂时一个目录浮动的例子。

output: html_document: theme: cosmo highlight: monochrome toc: true toc_float: true

* [More YAML hacks | R](https://campus.datacamp.com/courses/communicating-with-data-in-the-tidyverse/customizing-your-rmarkdown-report?ex=4)

![](../../../picbackup/code_folding.png?imageView2/2/w/600)

`code_folding: hide`
这里就是可以保证文中代码都可以隐藏,清爽很多。

### [Cascading Style Sheets (CSS)](https://campus.datacamp.com/courses/communicating-with-data-in-the-tidyverse/customizing-your-rmarkdown-report?ex=5)

[CSS selectors - CSS | MDN](https://developer.mozilla.org/en-US/docs/Web/CSS/CSS_Selectors)
这是引用。

* [Change style attributes of text elements | R](https://campus.datacamp.com/courses/communicating-with-data-in-the-tidyverse/customizing-your-rmarkdown-report?ex=7)
<style>
...
</style>
要是要把应用的css框起来。

body, h1, h2, h3, h4 { font-family: “Times new roman”, serif; }

> [They are styles of font. Serif includes small lines, Sans Serif (sans means without) doesn't include them.](https://stackoverflow.com/questions/32569696/what-do-serif-and-sans-serif-mean)

`serif`表示的是衬线字体
`sans-serif`表示的是无衬线字体,
我也不是特别懂。

pre { font-size: 10px; }

衡量了code字体的大小。

/* Selects any element when “hovered” */ a:hover { color: orange; }

> [The `:hover` CSS](https://developer.mozilla.org/en-US/docs/Web/CSS/:hover) pseudo-class matches when the user interacts with an element with a pointing device, but does not necessarily activate it. It is generally triggered when the user hovers over an element with the cursor (mouse pointer).

`:hover`就算给超链接上色。
`a`表示any。

* [Reference the style sheet | R](https://campus.datacamp.com/courses/communicating-with-data-in-the-tidyverse/customizing-your-rmarkdown-report?ex=8)

`css: styles.css`可以外部引用,类似于`.bib`。

将类似这种用
<style>
...
</style>

``` 框起来的规则,存入一个文档,设置好路径,尽量放在一个文件夹,,然后直接引用就好了。

表格打印的问题,每次都要加上一句knitr::kable()好累。 直接在yaml里面限定,df_print: kable就好了。

这哥们是个追求细节的颜控。 css开了个头就好,这个还没有积累到一定量,才可以用。 但是labs等参数设计还是非常有用的,dot plot也是,也算非常有收获了。

"学习笔记 系列导航

1 "学习笔记:Deep Learning in Python 学习笔记 2017-12-22
2 "学习笔记:Python 学习的流水笔记 2017-12-25
3 "学习笔记:Network Analysis in Python Part 1 学习笔记 2017-12-27
4 "学习笔记:XGBoost using Python 学习笔记 2017-12-28
5 "学习笔记:Supervised Learning with scikit-learn 学习笔记 2017-12-30
6 "学习笔记:Boosting理论部分 学习笔记 2018-01-02
7 "学习笔记:Machine Learning with the Experts School Budgets 学习笔记 2018-01-02
8 "学习笔记:犯罪心理解析 2018-01-02
9 "学习笔记:决策树理论部分 学习笔记 2018-01-03
10 "学习笔记:Shell 学习笔记 2018-01-04
11 "学习笔记:客户价值定价 学习笔记 2018-01-04
12 "学习笔记:Introduction to Git for Data Science 学习笔记 2018-01-06
13 "学习笔记:线性代数 整理笔记 2018-01-08
14 "学习笔记:退火算法 学习笔记 2018-01-09
15 "学习笔记:Fahrenheit 911 视频笔记 2018-01-18
16 "学习笔记:pandas debugging 学习笔记 2018-01-19
17 "学习笔记:brilliant.org概率论导论 学习笔记 2018-01-22
18 "学习笔记:Machine Learning with Tree-Based Models in R 学习笔记 2018-01-22
19 "学习笔记:Building Web Applications in R with Shiny 学习笔记 2018-01-25
20 "学习笔记:Inference for Numerical Data 学习笔记 2018-01-26
21 "学习笔记:Support Vector Machines SVM 学习笔记 2018-01-26
22 "学习笔记:Introduction to DataCamp Projects 学习笔记 2018-01-28
23 "学习笔记:Working with Web Data in R 学习笔记 2018-01-28
24 "学习笔记:三种平均数使用的方式 学习笔记 2018-01-29
25 "学习笔记:戒律的复活 每周六更新 2018-01-29
26 "学习笔记:Kaggle R Tutorial on Machine Learning 学习笔记 2018-02-01
27 "学习笔记:Kaggle Python Tutorial on Machine Learning 学习笔记 2018-02-02
28 "学习笔记:圆桌派 第三季 视频笔记 2018-02-05
29 "学习笔记:基础与技巧整理 2018-02-25
30 "学习笔记:英语学习:积累:词汇、表达与语法整理 2018-04-09
31 "学习笔记:魏剑峰英语学习:笔记:表达与语法整理 2018-05-02
32 "学习笔记:Planet Money播客学习笔记:经济学话题解析 2018-06-05
33 "学习笔记:WSJ 学习笔记 2020-10-19