Dropping factors which levels have observations smaller than a specific value-R

后端 未结 4 685
臣服心动
臣服心动 2021-01-16 19:49

Let I have such data frame(df1) with factors:

factor1  factor2  factor3
-------  -------  -------
d        a         x
d        a         x
b        a                


        
相关标签:
4条回答
  • 2021-01-16 20:35

    We could use Filter

    Filter(function(x) min(nlevels(x))>2, df1)
    

    (based on the results in one of the upvoted posts)

    Or it could be also

    Filter(function(x) min(tabulate(x))>2, df1)
    
    0 讨论(0)
  • 2021-01-16 20:38

    is that what you want?

    df <- data.frame(col1=rep(letters[1:4], each=3),
                     col2=rep(letters[1:2], each=6),
                     col3=rep(letters[1:3], each=4))
    
    ddf[, sapply(df, function(x) min(nlevels(x)) > 2)]
    
    0 讨论(0)
  • 2021-01-16 20:41

    You could try using lapply and table:

    df1[, lapply(c(1,2,3), FUN = function(x) min(table(df1[,x]))) >= 3]
    

    and, a little more generic:

    df1[, lapply(1:ncol(df1), FUN = function(x) min(table(df1[,x]))) >= 3]
    
    0 讨论(0)
  • 2021-01-16 20:44

    I would create a quick helper function that checks how many unique instances of each level exist with a quick call to table() -- look at table(df$fac1) to see how this works. Note this isn't very robust, but should get you started:

    df <- data.frame(fac1 = factor(c("d", "d", "b", "b", "b", "c", "c", "c", "c")),
                     fac2 = factor(c("a", "a", "a", "c", "c", "c", "n", "n", "n")),
                     fac3 = factor(c(rep("x", 4), rep("y", 5))),
                     other = 1:9)
    
    at_least_three_instances <- function(column) {
      if (is.factor(column)) {
        if (min(table(column)) > 2) {
          return(TRUE)
        } else {
          return(FALSE)
        }
      } else {
        return(TRUE)
      }
    }
    
    df[unlist(lapply(df, at_least_three_instances))]
    
    0 讨论(0)
提交回复
热议问题