以caret包的oil数据进行分析。
假设一个数据的Class中,最少的level是A,最多的是B。
那么随机采样下,
downSample使得所有level的数量都等于A
upSample使得所有level的数量都等于B
downSamplewill randomly sample a data set so that all classes have the same frequency as the minority class.
# Perform logistic regression with upsampling and no resampling
vote_glm <- train(turnout16_2016 ~ ., method = "glm", family = "binomial",
data = training,
trControl = trainControl(method = "none",
sampling = "up"))
同时,caret::train函数的参数trControl中可以直接设计sampling的方法。
(Silge 2018)
参考文献
Silge, Julia. 2018. “Supervised Learning in R: Case Studies.” 2018. https://campus.datacamp.com/courses/supervised-learning-in-r-case-studies/get-out-the-vote?ex=9.