以caret包的oil数据进行分析。
假设一个数据的Class中,最少的level是A,最多的是B。
那么随机采样下,
downSample使得所有level的数量都等于A
upSample使得所有level的数量都等于B
downSamplewill randomly sample a data set so that all classes have the same frequency as the minority class.
# Perform logistic regression with upsampling and no resampling
vote_glm <- train(turnout16_2016 ~ ., method = "glm", family = "binomial",
data = training,
trControl = trainControl(method = "none",
sampling = "up"))
同时,caret::train函数的参数trControl中可以直接设计sampling的方法。
(Silge 2018)
参考文献
Silge, Julia. 2018. “Understanding PCA Using Stack Overflow Data.” Julia Silge’s blog. 2018. https://juliasilge.com/blog/stack-overflow-pca/.