问题
I need to apply the smote-algorithm to a data set, but can't get it to work.
Example:
x <- c(12,13,14,16,20,25,30,50,75,71)
y <- c(0,0,1,1,1,1,1,1,1,1)
frame <- data.frame(x,y)
library(DMwR)
smotedobs <- SMOTE(y~ ., frame, perc.over=300)
This gives the following error:
Error in scale.default(T, T[i, ], ranges) : subscript out of bounds
In addition: Warning messages:
1: In FUN(newX[, i], ...) :
no non-missing arguments to max; returning -Inf
2: In FUN(newX[, i], ...) : no non-missing arguments to min; returning Inf
Would appriciate any kind of help or hints.
回答1:
I don't have the full answer. I can provide another clue though:
If you convert 'y' to a factor, SMOTE will return without error - but the synthesized observations have NA values for x.
回答2:
SMOTE has a bug in OS Win7 32 bit, It assume the target variable in the parameter 'form' is the last column in the dataset, the following code will explain
library(DMwR)
data(iris)
# data <- iris[, c(1, 2, 5)] # SMOTE work
data <- iris[, c(2, 5, 1)] # SMOTE bug
data$Species <- factor(ifelse(data$Species == "setosa", "rare", "common"))
head(data)
table(data$Species)
newData <- SMOTE(Species ~., data, perc.over=600, perc.under=100)
table(newData$Species)
It will show following message
Error in
colnames<-
(*tmp*
, value = c("Sepal.Width", "Species", "Sepal.Length" : 'names' attribute [3] must be the same length as the vector [2]
In Win7 64bit, the order problem does not occur!!
回答3:
There is a bug in the SMOTE code. It assumes the y function it's being fed is already a factor variable, currently it does not handle the edge case of non-factors. Make sure to cast to a factor before calling the method.
来源:https://stackoverflow.com/questions/15881358/r-dmwr-package-smote-function-wont-work