What is the most efficient way to cast a list as a data frame?

前端未结

关注

 7  843

Very often I want to convert a list wherein each index has identical element types to a data frame. For example, I may have a list:

> my.list
[[1]]
[[1]]


                      
              相关标签:


      
      
        
          7条回答        

        
                         				            
            
           
            
                              
                
              
              
                
                  再見小時候        
                
              
                            
                2020-11-29 17:18
              
            
            
                                                                       
I can't tell you this is the "most efficient" in terms of memory or speed, but it's pretty efficient in terms of coding:

my.df <- do.call("rbind", lapply(my.list, data.frame))


the lapply() step with data.frame() turns each list item into a single row data frame which then acts nice with rbind()
                                                                        
                                                        
            
            
              
                
                0
              
                 
                
               讨论(0)
              
              
                                                   
              
                                                            
            
                      
                    


               
            
    发布评论:
    
         
                        
    
    提交评论 
  
  

                    
                    
                    
                        
                        
                         加载中...
                        
                    
                
          
          	          
            
           
            
                              
                
              
              
                
                  攒了一身酷        
                
              
                            
                2020-11-29 17:24
              
            
            
                                                                       
Although this question has long since been answered, it's worth pointing out the data.table package has rbindlist which accomplishes this task very quickly:

library(microbenchmark)
library(data.table)
l <- replicate(1E4, list(a=runif(1), b=runif(1), c=runif(1)), simplify=FALSE)

microbenchmark( times=5,
  R=as.data.frame(Map(f(l), names(l[[1]]))),
  dt=data.frame(rbindlist(l))
)


gives me

Unit: milliseconds
 expr       min        lq    median        uq       max neval
    R 31.060119 31.403943 32.278537 32.370004 33.932700     5
   dt  2.271059  2.273157  2.600976  2.635001  2.729421     5

                                                                        
                                                        
            
            
              
                
                0
              
                 
                
               讨论(0)
              
              
                                                   
              
                                                            
            
                      
                    


               
            
    发布评论:
    
         
                        
    
    提交评论 
  
  

                    
                    
                    
                        
                        
                         加载中...
                        
                    
                
          
          	          
            
           
            
                              
                
              
              
                
                  星月不相逢        
                
              
                            
                2020-11-29 17:24
              
            
            
                                                                       
Not sure where they rank as far as efficiency, but depending on the structure of your lists there are some tidyverse options. A bonus is that they work nicely with unequal length lists:

l <- list(a = list(var.1 = 1, var.2 = 2, var.3 = 3)
        , b = list(var.1 = 4, var.2 = 5)
        , c = list(var.1 = 7, var.3 = 9)
        , d = list(var.1 = 10, var.2 = 11, var.3 = NA))

df <- dplyr::bind_rows(l)
df <- purrr::map_df(l, dplyr::bind_rows)
df <- purrr::map_df(l, ~.x)

# all create the same data frame:
# A tibble: 4 x 3
  var.1 var.2 var.3
  <dbl> <dbl> <dbl>
1     1     2     3
2     4     5    NA
3     7    NA     9
4    10    11    NA


And you can also mix vectors and data frames:

library(dplyr)
bind_rows(
  list(a = 1, b = 2),
  data_frame(a = 3:4, b = 5:6),
  c(a = 7)
)

# A tibble: 4 x 2
      a     b
  <dbl> <dbl>
1     1     2
2     3     5
3     4     6
4     7    NA

                                                                        
                                                        
            
            
              
                
                0
              
                 
                
               讨论(0)
              
              
                                                   
              
                                                            
            
                      
                    


               
            
    发布评论:
    
         
                        
    
    提交评论 
  
  

                    
                    
                    
                        
                        
                         加载中...
                        
                    
                
          
          	          
            
           
            
                              
                
              
              
                
                  醉酒成梦        
                
              
                            
                2020-11-29 17:34
              
            
            
                                                                       
The dplyr package's bind_rows is efficient.

one <- mtcars[1:4, ]
two <- mtcars[11:14, ]
system.time(dplyr::bind_rows(one, two))
   user  system elapsed 
  0.001   0.000   0.001 

                                                                        
                                                        
            
            
              
                
                0
              
                 
                
               讨论(0)
              
              
                                                   
              
                                                            
            
                      
                    


               
            
    发布评论:
    
         
                        
    
    提交评论 
  
  

                    
                    
                    
                        
                        
                         加载中...
                        
                    
                
          
          	          
            
           
            
                              
                
              
              
                
                  南笙        
                
              
                            
                2020-11-29 17:38
              
            
            
                                                                       
I think you want:

> do.call(rbind, lapply(my.list, data.frame, stringsAsFactors=FALSE))
  global_stdev_ppb      range   tok global_freq_ppb
1         24267673 0.03114799 hello        211592.6
2         11561448 0.08870838 world       1002043.0
> str(do.call(rbind, lapply(my.list, data.frame, stringsAsFactors=FALSE)))
'data.frame':   2 obs. of  4 variables:
 $ global_stdev_ppb: num  24267673 11561448
 $ range           : num  0.0311 0.0887
 $ tok             : chr  "hello" "world"
 $ global_freq_ppb : num  211593 1002043

                                                                        
                                                        
            
            
              
                
                0
              
                 
                
               讨论(0)
              
              
                                                   
              
                                                            
            
                      
                    


               
            
    发布评论:
    
         
                        
    
    提交评论 
  
  

                    
                    
                    
                        
                        
                         加载中...
                        
                    
                
          
          	          
            
           
            
                              
                
              
              
                
                  遇见更好的自我        
                
              
                            
                2020-11-29 17:38
              
            
            
                                                                       
Another option is:

data.frame(t(sapply(mylist, `[`)))


but this simple manipulation results in a data frame of lists:

> str(data.frame(t(sapply(mylist, `[`))))
'data.frame':   2 obs. of  3 variables:
 $ a:List of 2
  ..$ : num 1
  ..$ : num 2
 $ b:List of 2
  ..$ : num 2
  ..$ : num 3
 $ c:List of 2
  ..$ : chr "a"
  ..$ : chr "b"


An alternative to this, along the same lines but now the result same as the other solutions, is:

data.frame(lapply(data.frame(t(sapply(mylist, `[`))), unlist))


[Edit: included timings of @Martin Morgan's two solutions, which have the edge over the other solution that return a data frame of vectors.] Some representative timings on a very simple problem:

mylist <- list(list(a = 1, b = 2, c = "a"), list(a = 2, b = 3, c = "b"))

> ## @Joshua Ulrich's solution:
> system.time(replicate(1000, do.call(rbind, lapply(mylist, data.frame,
+                                     stringsAsFactors=FALSE))))
   user  system elapsed 
  1.740   0.001   1.750

> ## @JD Long's solution:
> system.time(replicate(1000, do.call(rbind, lapply(mylist, data.frame))))
   user  system elapsed 
  2.308   0.002   2.339

> ## my sapply solution No.1:
> system.time(replicate(1000, data.frame(t(sapply(mylist, `[`)))))
   user  system elapsed 
  0.296   0.000   0.301

> ## my sapply solution No.2:
> system.time(replicate(1000, data.frame(lapply(data.frame(t(sapply(mylist, `[`))), 
+                                               unlist))))
   user  system elapsed 
  1.067   0.001   1.091

> ## @Martin Morgan's Map() sapply() solution:
> f = function(x) function(i) sapply(x, `[[`, i)
> system.time(replicate(1000, as.data.frame(Map(f(mylist), names(mylist[[1]])))))
   user  system elapsed 
  0.775   0.000   0.778

> ## @Martin Morgan's Map() lapply() unlist() solution:
> f = function(x) function(i) unlist(lapply(x, `[[`, i), use.names=FALSE)
> system.time(replicate(1000, as.data.frame(Map(f(mylist), names(mylist[[1]])))))
   user  system elapsed 
  0.653   0.000   0.658

                                                                        
                                                        
            
            
              
                
                0
              
                 
                
               讨论(0)
              
              
                                                   
              
                                                            
            
                      
                    


               
            
    发布评论:
    
         
                        
    
    提交评论 
  
  

                    
                    
                    
                        
                        
                         加载中...
                        
                    
                
          
          	          
   
          
     1
2
下一页
           
           
        
                                  
        
        
          
            
            
              
              
            
    


                                 
              
            
                          
    

        
         
                验证码
                
                  
                
                
                   看不清?
                
              
                                  
                    
   
                 
             
              提交回复