I have a large data.frame that was generated by a process outside my control, which may or may not contain variables with zero variance (i.e. all the observations are the sa
You may also want to look into the nearZeroVar()
function in the caret package.
If you have one event out of 1000, it might be a good idea to discard these data (but this depends on the model). nearZeroVar()
can do that.
Well, save yourself some coding time:
Rgames: foo
[,1] [,2] [,3]
[1,] 1 1e+00 1
[2,] 1 2e+00 1
[3,] 1 3e+00 1
[4,] 1 4e+00 1
[5,] 1 5e+00 1
[6,] 1 6e+00 2
[7,] 1 7e+00 3
[8,] 1 8e+00 1
[9,] 1 9e+00 1
[10,] 1 1e+01 1
Rgames: sd(foo)
[1] 0.000000e+00 3.027650e+00 6.749486e-01
Warning message:
sd(<matrix>) is deprecated.
Use apply(*, 2, sd) instead.
To avoid nasty floating-point roundoffs, take that output vector, which I'll call "bar," and do something like bar[bar< 2*.Machine$double.eps] <- 0
and then finally your data frame dat[,as.logical(bar)]
should do the trick.