R rfe function “caret” Package error: there should be the same number of samples in x and y

ぐ巨炮叔叔 提交于 2019-12-04 04:16:56

问题


As I'm trying the rfe example from the "caret" package taken from here, I kept on receiving this error

  Error in rfe.default(d[1:2901, ], c(1, 1, 1, 1, 1, 1, 2, 2, 2, 3, 3, 3,  : 
  there should be the same number of samples in x and y

This question has been asked but its solution doesn't apply in this case.

Here's the code:

set.seed(7)
# load the library
library(mlbench)
library(caret)

# load the data
d <- read.table("d.dat")

# define the control using a random forest selection function
control <- rfeControl(functions=rfFuncs, method="cv", number=10)

# run the RFE algorithm
results <- rfe(d[1:2901, ],   c(1,1,1,1, 1, 1,2,2,2, 3 ,3,3,4, 4, 4),   sizes=c(1:2901), rfeControl=control)

# summarize the results
print(results)

The dataset is a data frame of 2901 rows (features) and 15 columns. The vector c(1,1,1,1,1,1,2,2,2,3,3,3,4,4,4) is the predictor for the features.

What parameter am I setting wrong?


回答1:


There is a convention that rows are observations and columns are features. The way you provided x argument to rfe means you have 2901 observations, which produces a mismatch with 15 outcomes. Use transpose function t on your data (if it has 15 columns of course).

The y = c(1,1,1...) vector shouldn't be called predictor. It is dependent variable or outcome. First argument is a data.frame of predictor variables.




回答2:


We don't know your data, but this works with simulated data:

set.seed(7)
d=data.frame(matrix(rnorm(2901*15,1,.5),ncol=15))
#something like dependent variable
dp=factor(sample(c(1,1,1,1, 1, 1,2,2,2, 3 ,3,3,4, 4, 4),2901,replace = TRUE))

# define the control using a random forest selection function
control <- rfeControl(functions=rfFuncs, method="cv", number=10)

# run the RFE algorithm
sz=50 # Change sz to 2901 for full sample
results <- rfe(d[1:sz, ],   dp[1:sz],   sizes=c(1:15), rfeControl=control)

# summarize the results
print(results)
## End of the printed results
## The top 5 variables (out of 6):
##   X5, X6, X15, X14, X3



回答3:


rfe(x, y,sizes = subsets, rfeControl = ctrl)

Your problem is that you dont have the nr of rows of x as same length of the vector y



来源:https://stackoverflow.com/questions/30441657/r-rfe-function-caret-package-error-there-should-be-the-same-number-of-samples

标签
易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!