1 min read

处理imbalanced data

caret包的oil数据进行分析。

假设一个数据的Class中,最少的levelA,最多的是B。 那么随机采样下,

downSample使得所有level的数量都等于A upSample使得所有level的数量都等于B

downSample will randomly sample a data set so that all classes have the same frequency as the minority class.

# Perform logistic regression with upsampling and no resampling
vote_glm <- train(turnout16_2016 ~ ., method = "glm", family = "binomial",
                  data = training,
                  trControl = trainControl(method = "none",
                                           sampling = "up"))

同时,caret::train函数的参数trControl中可以直接设计sampling的方法。 (Silge 2018)

参考文献

Silge, Julia. 2018. “Supervised Learning in R: Case Studies.” 2018. https://campus.datacamp.com/courses/supervised-learning-in-r-case-studies/get-out-the-vote?ex=9.