{r setup, include=FALSE} knitr::opts_chunk$set(eval = FALSE) 以caret包的oil数据进行分析。
{r message=FALSE, warning=FALSE, include=FALSE} library(forecast) library(caret) library(tidyverse) data(oil) table(oilType) downSample(fattyAcids, oilType) %>% as.data.frame() %>% .$Class %>% table() upSample(fattyAcids, oilType) %>% as.data.frame() %>% .$Class %>% table()
假设一个数据的Class中,最少的level是A,最多的是B。 那么随机采样下,
downSample使得所有level的数量都等于A upSample使得所有level的数量都等于B
downSamplewill randomly sample a data set so that all classes have the same frequency as the minority class.
{r eval=F} # Perform logistic regression with upsampling and no resampling vote_glm <- train(turnout16_2016 ~ ., method = "glm", family = "binomial", data = training, trControl = trainControl(method = "none", sampling = "up"))
同时,caret::train函数的参数trControl中可以直接设计sampling的方法。 [@Silge2018]