Quickly remove zero variance variables from a data.frame

后端 未结 8 681
独厮守ぢ
独厮守ぢ 2020-12-13 01:07

I have a large data.frame that was generated by a process outside my control, which may or may not contain variables with zero variance (i.e. all the observations are the sa

相关标签:
8条回答
  • 2020-12-13 01:33

    You may also want to look into the nearZeroVar() function in the caret package.

    If you have one event out of 1000, it might be a good idea to discard these data (but this depends on the model). nearZeroVar() can do that.

    0 讨论(0)
  • 2020-12-13 01:35

    Well, save yourself some coding time:

    Rgames: foo
          [,1]  [,2] [,3]
     [1,]    1 1e+00    1
     [2,]    1 2e+00    1
     [3,]    1 3e+00    1
     [4,]    1 4e+00    1
     [5,]    1 5e+00    1
     [6,]    1 6e+00    2
     [7,]    1 7e+00    3
     [8,]    1 8e+00    1
     [9,]    1 9e+00    1
     [10,]    1 1e+01    1
    Rgames: sd(foo)
    [1] 0.000000e+00 3.027650e+00 6.749486e-01
    Warning message:
    sd(<matrix>) is deprecated.
     Use apply(*, 2, sd) instead.   
    

    To avoid nasty floating-point roundoffs, take that output vector, which I'll call "bar," and do something like bar[bar< 2*.Machine$double.eps] <- 0 and then finally your data frame dat[,as.logical(bar)] should do the trick.

    0 讨论(0)
提交回复
热议问题