Remove constant columns with or without NAs

前端 未结 7 1512
深忆病人
深忆病人 2021-01-17 17:30

I am trying to get many lm models work in a function and I need to automatically drop constant columns from my data.table. Thus, I want to keep only columns wit

相关标签:
7条回答
  • 2021-01-17 18:02

    Because you have a data.table, you may use uniqueN and its na.rm argument:

    df[ , lapply(.SD, function(v) if(uniqueN(v, na.rm = TRUE) > 1) v)]
    #     x
    # 1:  1
    # 2:  2
    # 3:  3
    # 4: NA
    # 5:  5
    

    A base alternative could be Filter(function(x) length(unique(x[!is.na(x)])) > 1, df)

    0 讨论(0)
  • 2021-01-17 18:06

    If you really mean DROPing those columns, here is a solution:

    library(data.table)
    dt <- data.table(x=c(1,2,3,NA,5), 
                     y=c(1,1,NA,NA,NA),
                     z=c(NA,NA,NA,NA,NA), 
                     d=c(2,2,2,2,2))
    
    for (col in names(copy(dt))){
        v = var(dt[[col]], na.rm = TRUE)
        if (v == 0 | is.na(v)) dt[, (col) := NULL]
    }
    
    0 讨论(0)
  • 2021-01-17 18:11

    Just change

    all(is.na(.col)) || all(.col[1L] == .col)

    to

    all(is.na(.col) | .col[1L] == .col)

    Final code:

    same <- sapply( df, function(.col){ all( is.na(.col) | .col[1L] == .col ) } )
    df1 <- df[,!same, with=F]
    

    Result:

        x
    1:  1
    2:  2
    3:  3
    4: NA
    5:  5
    
    0 讨论(0)
  • 2021-01-17 18:15

    For removing constant columns,

    1. Numeric Columns:-

      constant_col = [const for const in df.columns if df[const].std() == 0]
      print (len(constant_col))
      print (constant_col)
      
    2. Categorical Columns:-

      constant_col = [const for const in df.columns if len(df[const].unique()) == 1]
      print (len(constant_col))
      print (constant_col)
      

    Then you drop the columns using the drop method

    0 讨论(0)
  • 2021-01-17 18:25

    There is simple solution with function Filter in base r. It will help.

    library(data.table)
    df <- data.table(x=c(1,2,3,NA,5), y=c(1,1,NA,NA,NA),z=c(NA,NA,NA,NA,NA), 
                     d=c(2,2,2,2,2))
    # Select only columns for which SD is not 0
    > Filter(function(x) sd(x, na.rm = TRUE) != 0, df)
        x
    1:  1
    2:  2
    3:  3
    4: NA
    5:  5
    

    Note: Don't forget to use na.rm = TRUE.

    0 讨论(0)
  • 2021-01-17 18:25

    Here is an option:

    df[,which(df[,
           unlist(
            sapply(.SD,function(x) length(unique(x[!is.na(x)])) >1))]),
       with=FALSE]
    
        x
    1:  1
    2:  2
    3:  3
    4: NA
    5:  5
    

    For each column of the data.table we count the number of unique values different of NA. We keep only column that have more than one value.

    0 讨论(0)
提交回复
热议问题