fast subsetting in R

后端未结

关注

 5  498

情书的邮戳 2021-02-03 14:00

I\'ve got a dataframe dat of size 30000 x 50. I also have a separate list that contains points to groupings of rows from this dataframe, e.g.,

rows <- list(c(


      
      
        
          5条回答        

        
                    
            
            
                         
                
              
              
                
                   野趣味
                                             
                
                
                (楼主)
            
              
              
                2021-02-03 14:26
              

            
            
                        
One of the main issues is the matching of row names -- the default in [.data.frame is partial matching of row names and you probably don't want that, so you're better off with match. To speed it up even further you can use fmatch from fastmatch if you want. This is a minor modification with some speedup:

# naive
> system.time(res1 <- lapply(rows,function(r) dat[r,]))
   user  system elapsed 
 69.207   5.545  74.787 

# match
> rn <- rownames(dat)
> system.time(res1 <- lapply(rows,function(r) dat[match(r,rn),]))
   user  system elapsed 
 36.810  10.003  47.082 

# fastmatch
> rn <- rownames(dat)
> system.time(res1 <- lapply(rows,function(r) dat[fmatch(r,rn),]))
   user  system elapsed 
 19.145   3.012  22.226 


You can get further speed up by not using [ (it is slow for data frames) but splitting the data frame (using split) if your rows are non-overlapping and cover all rows (and thus you can map each row to one entry in rows).

Depending on your actual data you may be better off with matrices that have by far faster subsetting operators since they are native.
    
             
                                                        
            
            
              
                
                0
              
                   
                
               讨论(0)
              
                                                  
              
              
                          
             
       
          
              
                                       
     查看其它5个回答


            
                         
                    


               
            
    发布评论:
    
         
                        
    
    提交评论 
  
  

                    
                    
                    
                        
                        
                         加载中...
                        
                    
                
          
                              			
        
        
        
          
            
            
              
              
            
    


                                 
              
            
                          
    

        
         
                验证码
                
                  
                
                
                   看不清?
                
              
                                  
                    
   
                 
             
              提交回复