Remove constant columns with or without NAs

前端未结

关注

 7  1535

I am trying to get many lm models work in a function and I need to automatically drop constant columns from my data.table. Thus, I want to keep only columns wit

相关标签:

7条回答

青春惊慌失措

2021-01-17 18:02
Because you have a data.table, you may use uniqueN and its na.rm argument:
```
df[ , lapply(.SD, function(v) if(uniqueN(v, na.rm = TRUE) > 1) v)]
#     x
# 1:  1
# 2:  2
# 3:  3
# 4: NA
# 5:  5
```
A base alternative could be Filter(function(x) length(unique(x[!is.na(x)])) > 1, df)
0 讨论(0)
发布评论:

提交评论
- 加载中...

猫巷女王i

2021-01-17 18:06

If you really mean DROPing those columns, here is a solution:

library(data.table)
dt <- data.table(x=c(1,2,3,NA,5), 
                 y=c(1,1,NA,NA,NA),
                 z=c(NA,NA,NA,NA,NA), 
                 d=c(2,2,2,2,2))

for (col in names(copy(dt))){
    v = var(dt[[col]], na.rm = TRUE)
    if (v == 0 | is.na(v)) dt[, (col) := NULL]
}

0 讨论(0)

余生分开走

2021-01-17 18:11
Just change

all(is.na(.col)) || all(.col[1L] == .col)

to

all(is.na(.col) | .col[1L] == .col)

Final code:
```
same <- sapply( df, function(.col){ all( is.na(.col) | .col[1L] == .col ) } )
df1 <- df[,!same, with=F]
```
Result:
```
    x
1:  1
2:  2
3:  3
4: NA
5:  5
```
0 讨论(0)
发布评论:

提交评论
- 加载中...

小蘑菇

2021-01-17 18:15

For removing constant columns,

Numeric Columns:-

constant_col = [const for const in df.columns if df[const].std() == 0]
print (len(constant_col))
print (constant_col)

Categorical Columns:-

constant_col = [const for const in df.columns if len(df[const].unique()) == 1]
print (len(constant_col))
print (constant_col)

Then you drop the columns using the drop method

0 讨论(0)

野性不改

2021-01-17 18:25

There is simple solution with function Filter in base r. It will help.

library(data.table)
df <- data.table(x=c(1,2,3,NA,5), y=c(1,1,NA,NA,NA),z=c(NA,NA,NA,NA,NA), 
                 d=c(2,2,2,2,2))
# Select only columns for which SD is not 0
> Filter(function(x) sd(x, na.rm = TRUE) != 0, df)
    x
1:  1
2:  2
3:  3
4: NA
5:  5

Note: Don't forget to use na.rm = TRUE.

0 讨论(0)

予麋鹿

2021-01-17 18:25
Here is an option:
```
df[,which(df[,
       unlist(
        sapply(.SD,function(x) length(unique(x[!is.na(x)])) >1))]),
   with=FALSE]

    x
1:  1
2:  2
3:  3
4: NA
5:  5
```
For each column of the data.table we count the number of unique values different of NA. We keep only column that have more than one value.
0 讨论(0)
发布评论:

提交评论
- 加载中...

1 2 下一页