R, DMwR-package, SMOTE-function won't work

给你一囗甜甜゛ 提交于 2020-01-13 10:08:16

问题


I need to apply the smote-algorithm to a data set, but can't get it to work.

Example:

x <- c(12,13,14,16,20,25,30,50,75,71)
y <- c(0,0,1,1,1,1,1,1,1,1)

frame <- data.frame(x,y)

library(DMwR)

smotedobs <- SMOTE(y~ ., frame, perc.over=300)

This gives the following error:

Error in scale.default(T, T[i, ], ranges) : subscript out of bounds
In addition: Warning messages:
1: In FUN(newX[, i], ...) :
  no non-missing arguments to max; returning -Inf
2: In FUN(newX[, i], ...) : no non-missing arguments to min; returning Inf

Would appriciate any kind of help or hints.


回答1:


I don't have the full answer. I can provide another clue though:

If you convert 'y' to a factor, SMOTE will return without error - but the synthesized observations have NA values for x.




回答2:


SMOTE has a bug in OS Win7 32 bit, It assume the target variable in the parameter 'form' is the last column in the dataset, the following code will explain

library(DMwR)
data(iris)
# data <- iris[, c(1, 2, 5)]  # SMOTE work
data <- iris[, c(2, 5, 1)]  # SMOTE bug
data$Species <- factor(ifelse(data$Species == "setosa", "rare", "common"))
head(data)
table(data$Species)
newData <- SMOTE(Species ~., data, perc.over=600, perc.under=100)
table(newData$Species)

It will show following message

Error in colnames<-(*tmp*, value = c("Sepal.Width", "Species", "Sepal.Length" : 'names' attribute [3] must be the same length as the vector [2]

In Win7 64bit, the order problem does not occur!!




回答3:


There is a bug in the SMOTE code. It assumes the y function it's being fed is already a factor variable, currently it does not handle the edge case of non-factors. Make sure to cast to a factor before calling the method.



来源:https://stackoverflow.com/questions/15881358/r-dmwr-package-smote-function-wont-work

标签
易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!