list all factor levels of a data.frame

前端 未结 6 1195
一整个雨季
一整个雨季 2020-12-28 15:57

with str(data) I get the headof the levels (1-2 values)

fac1: Factor w/ 2  levels ... :
fac2: Factor w/ 5  levels ... :
fac3: Facto         


        
相关标签:
6条回答
  • 2020-12-28 16:37

    Or using purrr:

    data %>% purrr:map(levels)
    

    Or to first factorize everything:

    data %>% dplyr::mutate_all(as.factor) %>% purrr:map(levels)
    

    And answering the question about how to get the lengths:

    data %>% map(levels) %>% map(length)
    
    0 讨论(0)
  • 2020-12-28 16:37

    In case you want to display factor levels only for thos columns which are declared as.factor, you can use:

    lapply(df[sapply(df, is.factor)], levels)
    
    0 讨论(0)
  • 2020-12-28 16:44

    Alternate option to get length of levels in a 'data'.frame:

    data_levels_length <- sapply(seq(1, ncol(data)), function(x){
      length(levels(data[,x]))
    })
    
    0 讨论(0)
  • 2020-12-28 16:45

    A simpler method is to use the sqldf package and use a select distinct statement. This makes it easier to automatically get the names of factor levels and then specify as levels to other columns/variables.

    Generic code snippet is:

    library(sqldf)
        array_name = sqldf("select DISTINCT *colname1* as '*column_title*' from *table_name*")
    

    Sample code using iris dataset:

    df1 = iris
    factor1 <- sqldf("select distinct Species as 'flower_type' from df1")
    factor1    ## to print the names of factors
    

    Output:

      flower_type
    1      setosa
    2  versicolor
    3   virginica
    
    0 讨论(0)
  • 2020-12-28 16:46

    If your problem is specifically to output a list of all levels for a factor, then I have found a simple solution using :

    unique(df$x)

    For instance, for the infamous iris dataset:

    unique(iris$Species)

    0 讨论(0)
  • 2020-12-28 16:50

    Here are some options. We loop through the 'data' with sapply and get the levels of each column (assuming that all the columns are factor class)

    sapply(data, levels)
    

    Or if we need to pipe (%>%) it, this can be done as

    library(dplyr)
    data %>% 
         sapply(levels)
    

    Or another option is summarise_each from dplyr where we specify the levels within the funs.

     data %>%
          summarise_each(funs(list(levels(.))))
    
    0 讨论(0)
提交回复
热议问题