发表新帖

发表新帖

R caret / rfe variable selection for factors() AND NAs

前端未结

关注

 1  1384

我在风中等你

I have a data set with NAs sprinkled generously throughout.

In addition it has columns that need to be factors().

I am using th

相关标签:

1条回答

轮回少年

2021-01-06 12:08
Because of inconsistent behavior on these points between packages, not to mention the extra trickiness when going to more "meta" packages like caret, I always find it easier to deal with NAs and factor variables up front, before I do any machine learning.
- For NAs, either omit or impute (median, knn, etc.).
- For factor features, you were on the right track with model.matrix(). It will let you generate a series of "dummy" features for the different levels of the factor. The typical usage is something like this:
```
> dat = data.frame(x=factor(rep(1:3, each=5)))
> dat$x
 [1] 1 1 1 1 1 2 2 2 2 2 3 3 3 3 3
Levels: 1 2 3
> model.matrix(~ x - 1, data=dat)
   x1 x2 x3
1   1  0  0
2   1  0  0
3   1  0  0
4   1  0  0
5   1  0  0
6   0  1  0
7   0  1  0
8   0  1  0
9   0  1  0
10  0  1  0
11  0  0  1
12  0  0  1
13  0  0  1
14  0  0  1
15  0  0  1
attr(,"assign")
[1] 1 1 1
attr(,"contrasts")
attr(,"contrasts")$x
[1] "contr.treatment"
```
Also, just in case you haven't (although it sounds like you have), the caret vignettes on CRAN are very nice and touch on some of these points. http://cran.r-project.org/web/packages/caret/index.html
0 讨论(0)
发布评论:

提交评论
- 加载中...

热议问题