1 min read

"基础算法系列:不平衡数据处理指南:方法与实战技巧

{r setup, include=FALSE} knitr::opts_chunk$set(eval = FALSE)caret包的oil数据进行分析。

{r message=FALSE, warning=FALSE, include=FALSE} library(forecast) library(caret) library(tidyverse) data(oil) table(oilType) downSample(fattyAcids, oilType) %>% as.data.frame() %>% .$Class %>% table() upSample(fattyAcids, oilType) %>% as.data.frame() %>% .$Class %>% table()

假设一个数据的Class中,最少的levelA,最多的是B。 那么随机采样下,

downSample使得所有level的数量都等于A upSample使得所有level的数量都等于B

downSample will randomly sample a data set so that all classes have the same frequency as the minority class.

{r eval=F} # Perform logistic regression with upsampling and no resampling vote_glm <- train(turnout16_2016 ~ ., method = "glm", family = "binomial", data = training, trControl = trainControl(method = "none", sampling = "up"))

同时,caret::train函数的参数trControl中可以直接设计sampling的方法。 [@Silge2018]

参考文献