8 min read

Communicating with Data in the Tidyverse 学习笔记

Communicating with Data in the Tidyverse

居然讲到了css,果断follow啊,还可以复习ggplot2。

  • 4 hours
  • 15 Videos
  • 53 Exercises
  • 265 Participants
  • 4,350 XP

这种人学习少的,肯定有价值啊。

Timo Grossenbacher 这哥们是个记者?这敢情好,肯定sense好,画图666。

Add labels to the plot | R

# Create the plot
ilo_plot <- 
ggplot(plot_data) +
  geom_point(aes(x = working_hours, y = hourly_compensation)) +
  # Add labels
  labs(
    x = "Working hours per week",
    y = "Hourly compensation",
    subtitle = "The more people work, the less compensation they seem to receive",
    title = "Working hours and hourly compensation in European countries, 2006",
    caption = "Data source: ILO, 2017"
  )
ilo_plot

caption位于右下角,作为数据来源说明。 subtitle表达了一定的观点。

Custom ggplot2 themes | R

default比较丑。 视频打不开。

Apply a default theme | R

ilo_plot +
  theme_minimal()

比原图好看。 theme_minimal()。 更多可以看这里。 比如theme_wsj是华尔街日报的风格。

library(ggthemes)
## Warning: 程辑包'ggthemes'是用R版本3.6.3 来建造的
ilo_plot +
  theme_wsj()

See how quickly you can change the overall appearance of a ggplot2 plot?

Change the appearance of titles | R

ilo_plot +
  theme_minimal() +
  # Customize the "minimal" theme with another custom "theme" call
  theme(
    # text = element_text(family = "Bookman"),
    title = element_text(color = "gray25"), # 字体灰色一点
    plot.subtitle = element_text(size = 12), # 大一点可以看得见
    plot.caption = element_text(color = "gray30") # 字体灰色一点
  )

感觉还是没有labs的功能强大。哈哈哈。

Alter background color and add margins | R

注意这个theme可以重复用,不影响。 plot.margin = unit(c(5, 10, 5, 10), units = "mm")告诉具体的单位。

ilo_plot +
  # "theme" calls can be stacked upon each other, so this is already the third call of "theme"
  theme(
    plot.background = element_rect(fill = "gray95"),
    plot.margin = unit(c(5, 10, 5, 10), units = "mm")
  )

背景改成了灰色"gray95",好难看。

Now your plot really stands out from the rest.

Visualizing aspects of data with facets | R

dotplot.

针对facet_grid, 可以用 strip.backgroundstrip.text

Defining your own theme function

theme_green <- function(){
  theme(
    plot.background = 
      element_rect(fill = "green"),
    panel.background = 
      element_rect(fill = 
        "lightgreen")
  )
}

之前plot.background已经修改过了,这里我们修改下panel.background

ilo_plot +
  theme_green()

# Filter ilo_data to retain the years 1996 and 1996
ilo_data1 <- 
  ilo_data %>%
  filter(year %in% c(1996,2006)) 
ilo_plot1 <- 
ilo_data1 %>% 
  ggplot(aes(x = working_hours, y = hourly_compensation)) +
  geom_point() +
   labs(
    x = "Working hours per week",
    y = "Hourly compensation",
    title = "The more people work, the less compensation they seem to receive",
    subtitle = "Working hours and hourly compensation in European countries, 2006",
    caption = "Data source: ILO, 2017"
  ) +
  # Add facets here
  facet_grid(facets = . ~ year) # facets 可以省略
ilo_plot1

丑因为是default的,所有的好看都是从labs开始, 然后在themetheme_*()开始。

Define your own theme function | R

# Define your own theme function below
theme_ilo <- function(){
    theme(
    # text = element_text(family = "Bookman", color = "gray25"),
    plot.subtitle = element_text(size = 12),
    plot.caption = element_text(color = "gray30"),
    plot.background = element_rect(fill = "gray95"),
    plot.margin = unit(c(5, 10, 5, 10), units = "mm")
  )
}

# For a starter, let's look at what you did before: adding various theme calls to your plot object
ilo_plot +
  theme_minimal() +
  theme_ilo()

总结就五个东西,

textplot.subtitleplot.caption: family字体,col颜色,size大小, 通过element_text构建。 这个是修改背景版本和页边距 plot.background = element_rect(fill = "gray95"),plot.margin = unit(c(5, 10, 5, 10), units = "mm")

# Apply your theme function
ilo_plot1 + 
  theme_ilo()

# Examine ilo_plot
ilo_plot1

ilo_plot1 +
  # Add another theme call
  theme(
    # Change the background fill to make it a bit darker
    strip.background = element_rect(fill = "gray60", color = "gray95"),
  ) +
  theme(
    # Make text a bit bigger and change its color to white
    strip.text = element_text(size = 11, color = "white")
  )

  • strip.background修改level上的背景颜色。
  • strip.text修改level上的字的颜色。

A custom plot to emphasize change | R

这个dot plot,不是我立即那个,其实是棒棒糖啊。

我可以用这个作为模型比较的表现,秀一波。

ggplot() +
    geom_path(aes(x = numeric_variable, y = numeric_variable))
ggplot() +
  geom_path(aes(x = numeric_variable, y = factor_variable))
ggplot() +
  geom_path(aes(x = numeric_variable, y = factor_variable),
              arrow = arrow(___))

开始搞geom_path。 但是心里有数,x必须是连续变量,比如\(R^2\)

A basic dot plot | R

# Create the dot plot
ilo_data %>% 
  filter(year %in% c(1996,2006)) %>% 
  ggplot() +
    geom_path(aes(x = working_hours, y = country))

别感到奇怪,先要知道为什么这样,看看数据结构就知道了。

ilo_data %>% 
  filter(year %in% c(1996,2006)) %>% 
  arrange(country) %>% 
  head()
## # A tibble: 6 x 4
##   country   year  hourly_compensation working_hours
##   <fct>     <fct>               <dbl>         <dbl>
## 1 Australia 1996                 17.0          34.6
## 2 Australia 2006                 26.1          33.1
## 3 Austria   1996                 24.8          32.0
## 4 Austria   2006                 30.5          31.8
## 5 Belgium   1996                 25.2          31.7
## 6 Belgium   2006                 31.9          30.2

所以啊,每个国家都要有一个最大值和最小值。 但是判断不了方向,也就是说你不知道随着时间变了,到底是增加了还是减少了。

Add arrows to the lines in the plot | R

ilo_data %>% 
  filter(year %in% c(1996,2006)) %>% 
ggplot() +
  geom_path(aes(x = working_hours, y = country),
  # Add an arrow to each path
            arrow = arrow(length = unit(1.5, "mm"), type = "closed"))

现在总算知道是减小的趋势了吧。 但是没有具体的数字没有意义,好累,所以还是要给出数字。

Add some labels to each country | R

这里通过geom_text()geom_label()加入数字,但是后者有背景,按需来。

ilo_data %>% 
  filter(year %in% c(1996,2006)) %>% 
ggplot() +
  geom_path(aes(x = working_hours, y = country),
            arrow = arrow(length = unit(1.5, "mm"), type = "closed")) +
  # Add a geom_text() geometry
  geom_text(
          aes(x = working_hours,
              y = country,
              label = round(working_hours, 1))
        )

但是有点重合,难受。

Polishing the dot plot | R

forcats::fct_rev这个很厉害了!简单。 理解下高配版的fct_reorderfct_reorder(country, working_hours, mean))根据,

group_by(country) %>% 
summarise(mean(working_hours))

来进行fct_reorder哈哈。

hjustvjust竟然可以这样!一定要搞懂。

ggplot(ilo_data) +
  geom_path(aes(...)) +
  geom_text(
        aes(...,
            hjust = ifelse(year == "2006", 
              1.4, 
              -0.4)
        )
    )

Reordering elements in the plot | R

library(forcats)
ilo_data %>% 
  filter(year %in% c(1996,2006)) %>% 
  # Arrange data frame
  arrange(country) %>%
  # Reorder countries by working hours in 2006
  mutate(country = fct_reorder(country,
                               working_hours,
                               last
                               )) %>% 

# Plot again
ggplot() +
  geom_path(aes(x = working_hours, y = country),
            arrow = arrow(length = unit(1.5, "mm"), type = "closed")) +
    geom_text(
          aes(x = working_hours,
              y = country,
              label = round(working_hours, 1))
          )

只不过又一个递增的趋势,根据2006年的working_hours来计算。

Correct ugly label positions | R

# Save plot into an object for reuse
ilo_data %>% 
  filter(year %in% c(1996,2006)) %>% 
  # Arrange data frame
  arrange(country) %>%
  # Reorder countries by working hours in 2006
  mutate(country = fct_reorder(country,
                               working_hours,
                               last
                               )) %>% 
  ggplot() +
  geom_path(aes(x = working_hours, y = country),
            arrow = arrow(length = unit(1.5, "mm"), type = "closed")) +
    # Specify the hjust aesthetic with a conditional value
  geom_text(
        aes(x = working_hours,
            y = country,
            label = round(working_hours, 1),
            hjust = ifelse(year == "2006", 1.4, -0.4)
          ),
        # Change the appearance of the text
        size = 3,
        # family = "Bookman",
        col = "gray25"
        ) 

hjust只平行移动, year == "2006"向右1.6year != "2006"向左0.5

但是有些字卡到边距上了。

Finalizing the plot for different audiences and devices | R

coord_cartesian vs. xlim / ylim

ggplot_object +
    coord_cartesian(xlim = c(0, 100), ylim = c(10, 20))
ggplot_object +
  xlim(0, 100) +
  ylim(10, 20)

这是两者的区别,所以就是是否删除数据,因此推荐用前者。

因此只需要加入coord_cartesian, 其中xlim = c(19, 41)多一点点即可。

# Save plot into an object for reuse
ilo_plot2 <- 
ilo_data %>% 
  filter(year %in% c(1996,2006)) %>% 
  # Arrange data frame
  arrange(country) %>%
  # Reorder countries by working hours in 2006
  mutate(country = fct_reorder(country,
                               working_hours,
                               last
                               )) %>% 
  ggplot() +
  geom_path(aes(x = working_hours, y = country),
            arrow = arrow(length = unit(1.5, "mm"), type = "closed")) +
    # Specify the hjust aesthetic with a conditional value
  labs(
    x = "Working hours per week",
    y = "Hourly compensation",
    subtitle = "The more people work, the less compensation they seem to receive",
    title = "Working hours and hourly compensation in European countries, 2006",
    caption = "Data source: ILO, 2017"
  ) + 
  geom_text(
        aes(x = working_hours,
            y = country,
            label = round(working_hours, 1),
            hjust = ifelse(year == "2006", 1.4, -0.4)
          ),
        # Change the appearance of the text
        size = 3,
        # family = "Bookman",
        col = "gray25"
        ) +
  coord_cartesian(xlim = c(25,41))
ilo_plot2

Desktop vs. Mobile audiences 这都考虑到了,真是厉害。还分桌面版和移动版(narrow and tall)。

Optimizing the plot for mobile devices | R

# Compute temporary data set for optimal label placement
median_working_hours <- 
  ilo_data %>% 
  filter(year %in% c(1996,2006)) %>% 
  # Arrange data frame
  arrange(country) %>%
  # Reorder countries by working hours in 2006
  mutate(country = fct_reorder(country,
                               working_hours,
                               last
                               )) %>%
  group_by(country) %>%
  summarize(median_working_hours_per_country = median(working_hours)) %>%
  ungroup()
## `summarise()` ungrouping output (override with `.groups` argument)
# Have a look at the structure of this data set
str(median_working_hours)
## tibble [27 x 2] (S3: tbl_df/tbl/data.frame)
##  $ country                         : Factor w/ 30 levels "Netherlands",..: 1 2 3 4 5 6 7 8 9 10 ...
##  $ median_working_hours_per_country: num [1:27] 27 27.8 28.4 31 30.9 ...
ilo_plot2 +
  # Add label for country
  geom_text(data = median_working_hours,
            aes(y = country,
                x = median_working_hours_per_country,
                label = country),
                vjust = 2,
                # family = "Bookman",
                color = "gray25") +
  # Remove axes and grids
  theme(
    axis.ticks = element_blank(),
    axis.title = element_blank(),
    axis.text = element_blank(),
    panel.grid = element_blank(),
    # Also, let's reduce the font size of the subtitle
    plot.subtitle = element_text(size = 3)
  )

主要是把国家加到横线附近的sense还是有的。 注意这里 {r fig.height=8, fig.width=4.5,fig.align='center'}这里使得图片满足了手机格式。高宽比=\(8:4.5\)且居中。

总结一下,现在就是学会了 labstheme的各种参数, 好看的模版theme_*, 一些debug的技能。 已经不错了,算是进步了。

HTML manual by RStudio 这个很有用好好学。

yaml抬头加入

output: 
  html_document:
    theme: united
    highlight: monochrome

Add a table of contents | R

toc: true中, toc指的是 table of contents,就是目录。 toc_float设定了是否跟随翻阅页面时,目录跟着移动。 toc_depth决定了目录的层级。

这里暂时一个目录浮动的例子。

output: 
  html_document:
    theme: cosmo
    highlight: monochrome
    toc: true
    toc_float: true

code_folding: hide 这里就是可以保证文中代码都可以隐藏,清爽很多。

Cascading Style Sheets (CSS)

CSS selectors - CSS | MDN 这是引用。

<style>
...
</style>

要是要把应用的css框起来。

body, h1, h2, h3, h4 {
    font-family: "Times new roman", serif;
}

They are styles of font. Serif includes small lines, Sans Serif (sans means without) doesn’t include them.

serif表示的是衬线字体 sans-serif表示的是无衬线字体, 我也不是特别懂。

pre {
    font-size: 10px;
}

衡量了code字体的大小。

/* Selects any <a> element when "hovered" */
a:hover {
  color: orange;
}

The :hover CSS pseudo-class matches when the user interacts with an element with a pointing device, but does not necessarily activate it. It is generally triggered when the user hovers over an element with the cursor (mouse pointer).

:hover就算给超链接上色。 a表示any。

css: styles.css可以外部引用,类似于.bib

将类似这种用

<style>
...
</style>

框起来的规则,存入一个文档,设置好路径,尽量放在一个文件夹,,然后直接引用就好了。

表格打印的问题,每次都要加上一句knitr::kable()好累。 直接在yaml里面限定,df_print: kable就好了。

这哥们是个追求细节的颜控。 css开了个头就好,这个还没有积累到一定量,才可以用。 但是labs等参数设计还是非常有用的,dot plot也是,也算非常有收获了。