using lapply function and list in r

后端未结

关注

 2  1648

d1 <- data.frame(col_one = c(1,2,3),col_two = c(4, 5, 6))
d2 <- data.frame(col_one = c(1, 1, 1), col_two = c(6, 5, 4))
d3 <- data.frame(col_one = c(7, 1, 1), co


                      
              相关标签:


      
      
        
          2条回答        

        
                         				            
            
           
            
                              
                
              
              
                
                  生来不讨喜        
                
              
                            
                2021-01-26 00:07
              
            
            
                                                                       
You need to iterate over data and counts simultaneously. In tidyverse I would recommend using purrr::map2(), but in base R you can simply do:'

table<- mapply(function(data, count) {
    sql <-
      #sqldf(
      paste0(
        "select *,count(col_one) from data where col_one = ",
        count," group by col_one"
      )
    #)
    print(sql)
  }, my.list, 1:3
  )
[1] "select *,count(col_one) from data where col_one = 1 group by col_one"
[1] "select *,count(col_one) from data where col_one = 2 group by col_one"
[1] "select *,count(col_one) from data where col_one = 3 group by col_one"

                                                                        
                                                        
            
            
              
                
                0
              
                 
                
               讨论(0)
              
              
                                                   
              
                                                            
            
                      
                    


               
            
    发布评论:
    
         
                        
    
    提交评论 
  
  

                    
                    
                    
                        
                        
                         加载中...
                        
                    
                
          
          	          
            
           
            
                              
                
              
              
                
                  别跟我提以往        
                
              
                            
                2021-01-26 00:20
              
            
            
                                                                       
If I understood correctly, the OP wants to create contingency tables for col_one for each of the data.frames in my.list, i.e., he wants to know how many times each of the values 1, 2, or 3 appear in col_one in each data.frame.

As explained in my answer to another question of the OP and as suggested by G. Grothendieck, it is almost always better to combine data.frames with identical structure in a large data.table than to keep them separate in a list. BTW, there is also a third question ("how to loop the dataframe using sqldf?") by the OP asking for help with a list of data.frames.

To combine the data.frames in a large data.table, the rbindlist() function is used. Note that the added id column df identifies the originating data.frame of each row.

library(data.table)
rbindlist(my.list, idcol = "df")



   df col_one col_two
1:  1       1       4
2:  1       2       5
3:  1       3       6
4:  2       1       6
5:  2       1       5
6:  2       1       4
7:  3       7       8
8:  3       1       5
9:  3       1       4



Now we can easily compute the aggregates:

rbindlist(my.list, idcol = "df")[, count_col_one := .N, by = .(df, col_one)][]



   df col_one col_two count_col_one
1:  1       1       4             1
2:  1       2       5             1
3:  1       3       6             1
4:  2       1       6             3
5:  2       1       5             3
6:  2       1       4             3
7:  3       7       8             1
8:  3       1       5             2
9:  3       1       4             2



This data.table statement counts the occurrences of each individual value in col_one for each df by using the special symbol .N and by grouping by df and col_one.

In the question, the OP has only asked to count occurrences of 1, 2, or 3 in col_one. If this really is intended, the value of 7 needs to be removed. This can be accomplished by filtering the result:

rbindlist(my.list, idcol = "df")[, count_col_one := .N, by = .(df, col_one)][
  col_one %in% 1:3]

                                                                        
                                                        
            
            
              
                
                0
              
                 
                
               讨论(0)
              
              
                                                   
              
                                                            
            
                      
                    


               
            
    发布评论:
    
         
                        
    
    提交评论 
  
  

                    
                    
                    
                        
                        
                         加载中...
                        
                    
                
          
          	          
                             
        
        
          
            
            
              
              
            
    


                                 
              
            
                          
    

        
         
                验证码
                
                  
                
                
                   看不清?
                
              
                                  
                    
   
                 
             
              提交回复