list all factor levels of a data.frame

前端未结

关注

 6  1201

with str(data) I get the headof the levels (1-2 values)

fac1: Factor w/ 2  levels ... :
fac2: Factor w/ 5  levels ... :
fac3: Facto


                      
              相关标签:


      
      
        
          6条回答        

        
                         				            
            
           
            
                              
                
              
              
                
                  生来不讨喜        
                
              
                            
                2020-12-28 16:37
              
            
            
                                                                       
Or using purrr:

data %>% purrr:map(levels)


Or to first factorize everything:

data %>% dplyr::mutate_all(as.factor) %>% purrr:map(levels)


And answering the question about how to get the lengths:

data %>% map(levels) %>% map(length)

                                                                        
                                                        
            
            
              
                
                0
              
                 
                
               讨论(0)
              
              
                                                   
              
                                                            
            
                      
                    


               
            
    发布评论:
    
         
                        
    
    提交评论 
  
  

                    
                    
                    
                        
                        
                         加载中...
                        
                    
                
          
          	          
            
           
            
                              
                
              
              
                
                  眼角桃花        
                
              
                            
                2020-12-28 16:37
              
            
            
                                                                       
In case you want to display factor levels only for thos columns which are declared as.factor, you can use:

lapply(df[sapply(df, is.factor)], levels)

                                                                        
                                                        
            
            
              
                
                0
              
                 
                
               讨论(0)
              
              
                                                   
              
                                                            
            
                      
                    


               
            
    发布评论:
    
         
                        
    
    提交评论 
  
  

                    
                    
                    
                        
                        
                         加载中...
                        
                    
                
          
          	          
            
           
            
                              
                
              
              
                
                  灰色年华        
                
              
                            
                2020-12-28 16:44
              
            
            
                                                                       
Alternate option to get length of levels in a 'data'.frame:

data_levels_length <- sapply(seq(1, ncol(data)), function(x){
  length(levels(data[,x]))
})

                                                                        
                                                        
            
            
              
                
                0
              
                 
                
               讨论(0)
              
              
                                                   
              
                                                            
            
                      
                    


               
            
    发布评论:
    
         
                        
    
    提交评论 
  
  

                    
                    
                    
                        
                        
                         加载中...
                        
                    
                
          
          	          
            
           
            
                              
                
              
              
                
                  北恋        
                
              
                            
                2020-12-28 16:45
              
            
            
                                                                       
A simpler method is to use the sqldf package and use a select distinct statement. This makes it easier to automatically get the names of factor levels and then specify as levels to other columns/variables.

Generic code snippet is:

library(sqldf)
    array_name = sqldf("select DISTINCT *colname1* as '*column_title*' from *table_name*")


Sample code using iris dataset:

df1 = iris
factor1 <- sqldf("select distinct Species as 'flower_type' from df1")
factor1    ## to print the names of factors


Output:

  flower_type
1      setosa
2  versicolor
3   virginica

                                                                        
                                                        
            
            
              
                
                0
              
                 
                
               讨论(0)
              
              
                                                   
              
                                                            
            
                      
                    


               
            
    发布评论:
    
         
                        
    
    提交评论 
  
  

                    
                    
                    
                        
                        
                         加载中...
                        
                    
                
          
          	          
            
           
            
                              
                
              
              
                
                  悲&欢浪女        
                
              
                            
                2020-12-28 16:46
              
            
            
                                                                       
If your problem is specifically to output a list of all levels for a factor, then I have found a simple solution using :


  unique(df$x) 


For instance, for the infamous iris dataset: 


  unique(iris$Species) 

                                                                        
                                                        
            
            
              
                
                0
              
                 
                
               讨论(0)
              
              
                                                   
              
                                                            
            
                      
                    


               
            
    发布评论:
    
         
                        
    
    提交评论 
  
  

                    
                    
                    
                        
                        
                         加载中...
                        
                    
                
          
          	          
            
           
            
                              
                
              
              
                
                  南旧        
                
              
                            
                2020-12-28 16:50
              
            
            
                                                                       
Here are some options.  We loop through the 'data' with sapply and get the levels of each column (assuming that all the columns are factor class)

sapply(data, levels)


Or if we need to pipe (%>%) it, this can be done as

library(dplyr)
data %>% 
     sapply(levels)


Or another option is summarise_each from dplyr where we specify the levels within the funs.

 data %>%
      summarise_each(funs(list(levels(.))))

                                                                        
                                                        
            
            
              
                
                0
              
                 
                
               讨论(0)
              
              
                                                   
              
                                                            
            
                      
                    


               
            
    发布评论:
    
         
                        
    
    提交评论 
  
  

                    
                    
                    
                        
                        
                         加载中...
                        
                    
                
          
          	          
                             
        
        
          
            
            
              
              
            
    


                                 
              
            
                          
    

        
         
                验证码
                
                  
                
                
                   看不清?
                
              
                                  
                    
   
                 
             
              提交回复