3 min read

Follow R 包 和 作者的方法

本文于2020-10-10更新。 如发现问题或者有建议,欢迎提交 Issue

1 pkgsearch

pkgsearch v2.0.1: Allows users to search CRAN R packages using the METACRAN search server.

https://github.com/metacran/pkgsearch#readme 我觉得这个包很好,只要速度快,就可以使用

library(pkgsearch)
## Warning: 程辑包'pkgsearch'是用R版本3.6.3 来建造的
pkg_search("markdown")
## - "markdown" ----------------------------------- 297 packages in 0.01 seconds -
##   #     package       version by               @ title                         
##   1 100 markdown      1.1     Yihui Xie       1y Render Markdown with the C ...
##   2  91 rmarkdown     2.4     Yihui Xie      10d Dynamic Documents for R       
##   3  36 prettydoc     0.4.0   Yixuan Qiu      2M Creating Pretty Documents f...
##   4  36 bookdown      0.20    Yihui Xie       4M Authoring Books and Technic...
##   5  23 knitcitations 1.0.10  Carl Boettiger  1y Citations for 'Knitr' Markd...
##   6  21 htmlTable     2.1.0   Max Gordon     24d Advanced Tables for Markdow...
##   7  20 rticles       0.16    Yihui Xie      18d Article Formats for R Markdown
##   8  18 htmlwidgets   1.5.2   Carson Sievert  7d HTML Widgets for R            
##   9  15 tufte         0.7     Yihui Xie      15d Tufte's Styles for R Markdo...
##  10  15 spelling      2.1     Jeroen Ooms     2y Tools for Spell Checking in R

很快。

比如发现 commonmark就没见过,可以关注下,因为专门适配 github 没有很有用,因为没有解决github LaTeX的问题。

但是这个速度挺快的。

pkg_search("github")
## - "github" ------------------------------------ 117 packages in 0.009 seconds -
##   #     package       version by                  @ title                      
##   1 100 remotes       2.2.0   Jim Hester         3M R Package Installation f...
##   2  98 gh            1.1.0   Gábor Csárdi       9M 'GitHub' 'API'             
##   3  63 gistr         0.9.0   Scott Chamberlain  2M Work with 'GitHub' 'Gists' 
##   4  55 commonmark    1.7     Jeroen Ooms        2y High Performance CommonM...
##   5  48 whoami        1.3.0   Gábor Csárdi       2y Username, Full Name, Ema...
##   6  29 usethis       1.6.3   Jennifer Bryan    23d Automate Package and Pro...
##   7  20 piggyback     0.0.11  Carl Boettiger     8M Managing Larger Data on ...
##   8  19 ThankYouStars 0.2.0   Naoto Koshimizu    3y Give your Dependencies S...
##   9  19 projmgr       0.1.0   Emily Riederer     1y Task Tracking and Projec...
##  10  19 switchrGist   0.2.4   Gabriel Becker     2y Publish Package Manifest...

2 packagefinder

Searching for R packages is a vexing problem for both new and experienced R users. With over 13,000 packages already on CRAN, and new packages arriving at a rate of almost 200 per month, it is impossible to keep up. Package names can be almost anything, and they are rarely informative, so searching by name is of little help. I make it a point to look at all of the new packages arriving on CRAN each month, but after a month or so, when asked about packages related to some particular topic, more often than not, I have little more to offer than a vague memory that I saw something that might be useful. (Rickert 2018)

Likewise, knowing something of an author’s background, his or her experience writing other R packages, and prominent R developers he or she may have collaborated with is also helpful in assessing whether to give a newly found package is worth a try. (Rickert 2018)

  1. 因此找包按照主题而非名字进行寻找。
  2. 基于作者之间的关系进行分析。
library(tidyverse)
library(packagefinder)
library(dlstats)
library(cranly)
  1. packagefinder::findPackage从CRAN获取数据
sem_pkg <- 
    bind_rows(
    'factor analysis' %>% 
        findPackage() %>% 
        as_tibble()
    ,'structural equation model' %>% 
        findPackage() %>% 
        as_tibble()
    ,'latent variable' %>% 
        findPackage() %>% 
        as_tibble()
    )
  1. dlstats::cran_stats从RStudio下载页面获取月度数据,这个函数跑的比较慢。
sem_pkg_download <- 
    sem_pkg %>% 
    rename_all(tolower) %>% 
    arrange(desc(score)) %>% 
    distinct(name) %>% 
    # head(100) %>% 
    .$name %>% 
    # 可以插入 vector,所以不需要map
    cran_stats()
Error in `levels<-`(`*tmp*`, value = if (nl == nL) as.character(labels) else paste0(labels, : factor level [65] is duplicated
  1. cran_stats的输入可以是vector,但是不能有重复值。
sem_pkg_mostdownload <- 
    sem_pkg_download %>% 
    group_by(package) %>% 
    summarize(downloads = mean(downloads)) %>%
    arrange(desc(downloads))
library(lubridate)
sem_pkg_download %>% 
    filter(year(end) >= 2017) %>% 
    filter(package %in% sem_pkg_mostdownload[1:10,]$package) %>% 
    mutate(is_high = package %in% sem_pkg_mostdownload[1:5,]$package) %>% 
    ggplot(aes(x=end,y=downloads,col=package)) + 
    geom_line() + 
    facet_wrap(~is_high,scales = 'free_y')
  1. 可以发现psych包的需求在下降。
  2. lavaan的需求在增加。
  3. lava,factoextra这两包之前没有注意过。
p_db <- tools::CRAN_package_db()
clean_p_db <- clean_CRAN_db(p_db)
author_net <- build_network(object = clean_p_db, perspective = "author")
plot(author_net, author = "Hadley Wickham", exact = FALSE)
  1. cranly::build_network用于分析作者。
author_summary <- summary(author_net)
plot(author_summary)
  1. 以上就是最近最厉害的R作者,用于 follow。这种方式是一种替代品,用于github上寻找作者list,靠谱很多。

Rickert, Joseph. 2018. “Searching for R Packages.” 2018. https://rviews.rstudio.com/2018/10/22/searching-for-r-packages/.