Avoid two for loops in R

前端未结

关注

 6  800

I have a R code that can do convolution of two functions...

convolveSlow <- function(x, y) {  
nx <- length(x); ny <- length(y)  
xy <- numeric(nx


                      
              相关标签:


      
      
        
          6条回答        

        
                         				            
            
           
            
                              
                
              
              
                
                  执念已碎        
                
              
                            
                2021-02-08 11:44
              
            
            
                                                                       
Since R is very fast at computing vector operations, the most important thing to keep in mind when programming for performance is to vectorise as many of your operations as possible.

This means thinking hard about replacing loops with vector operations.  Here is my solution for fast convolution (50 times faster with input vectors of length 1000 each):

convolveFast <- function(x, y) {
    nx <- length(x)
    ny <- length(y)
    xy <- nx + ny - 1
    xy <- rep(0, xy)
    for(i in (1:nx)){
        j <- 1:ny
        ij <- i + j - 1
        xy[i+(1:ny)-1] <- xy[ij] + x[i] * y
    }
    xy
}


You will notice that the inner loop (for j in ...) has disappeared.  Instead, I replaced it with a vector operation.  j is now defined as a vector (j <- 1:ny).  Notice also that I refer to the entire vector y, rather than subsetting it (i.e. y instead of y[j]).

j <- 1:ny
ij <- i + j - 1
xy[i+(1:ny)-1] <- xy[ij] + x[i] * y


I wrote a small function to measure peformance:

measure.time <- function(fun1, fun2, ...){
    ptm <- proc.time()
    x1 <- fun1(...)
    time1 <- proc.time() - ptm

    ptm <- proc.time()
    x2 <- fun2(...)
    time2 <- proc.time() - ptm

    ident <- all(x1==x2)

    cat("Function 1\n")
    cat(time1)
    cat("\n\nFunction 2\n")
    cat(time2)
    if(ident) cat("\n\nFunctions return identical results")

}


For two vectors of length 1000 each, I get a 98% performance improvement:

x <- runif(1000)
y <- runif(1000)
measure.time(convolveSlow, convolveFast, x, y)

Function 1
7.07 0 7.59 NA NA

Function 2
0.14 0 0.16 NA NA

Functions return identical results

                                                                        
                                                        
            
            
              
                
                0
              
                 
                
               讨论(0)
              
              
                                                   
              
                                                            
            
                      
                    


               
            
    发布评论:
    
         
                        
    
    提交评论 
  
  

                    
                    
                    
                        
                        
                         加载中...
                        
                    
                
          
          	          
            
           
            
                              
                
              
              
                
                  太阳男子        
                
              
                            
                2021-02-08 11:45
              
            
            
                                                                       
Some say the apply() and sapply() functions are faster than for() loops in R. You could convert the convolution to a function and call it from within apply().
However, there is evidence to the contrary
http://yusung.blogspot.com/2008/04/speed-issue-in-r-computing-apply-vs.html
                                                                        
                                                        
            
            
              
                
                0
              
                 
                
               讨论(0)
              
              
                                                   
              
                                                            
            
                      
                    


               
            
    发布评论:
    
         
                        
    
    提交评论 
  
  

                    
                    
                    
                        
                        
                         加载中...
                        
                    
                
          
          	          
            
           
            
                              
                
              
              
                
                  悲哀的现实        
                
              
                            
                2021-02-08 11:48
              
            
            
                                                                       
How about convolve(x, rev(y), type = "open") in stats?

> x <- runif(1000)
> y <- runif(1000)
> system.time(a <- convolve(x, rev(y), type = "o"))
   user  system elapsed 
  0.032   0.000   0.032 
> system.time(b <- convolveSlow(x, y))
   user  system elapsed 
 11.417   0.060  11.443 
> identical(a,b)
[1] FALSE
> all.equal(a,b)
[1] TRUE

                                                                        
                                                        
            
            
              
                
                0
              
                 
                
               讨论(0)
              
              
                                                   
              
                                                            
            
                      
                    


               
            
    发布评论:
    
         
                        
    
    提交评论 
  
  

                    
                    
                    
                        
                        
                         加载中...
                        
                    
                
          
          	          
            
           
            
                              
                
              
              
                
                  南笙        
                
              
                            
                2021-02-08 11:49
              
            
            
                                                                       
As Dirk says, compiled code can be a lot faster.  I had to do this for one of my projects and was surprised at the speedup: ~40x faster than Andrie's solution.

> a <- runif(10000)
> b <- runif(10000)
> system.time(convolveFast(a, b))
   user  system elapsed 
  7.814   0.001   7.818 
> system.time(convolveC(a, b))
   user  system elapsed 
  0.188   0.000   0.188 


I made several attempts to speed this up in R before I decided that using C code couldn't be that bad (note: it really wasn't).  All of mine were slower than Andrie's, and were variants on adding up the cross-product appropriately.  A rudimentary version can be done in just three lines.

convolveNotAsSlow <- function(x, y) {
  xyt <- x %*% t(y)
  ds <- row(xyt)+col(xyt)-1
  tapply(xyt, ds, sum)
}


This version only helps a little.

> a <- runif(1000)
> b <- runif(1000)
> system.time(convolveSlow(a, b))
   user  system elapsed 
  6.167   0.000   6.170 
> system.time(convolveNotAsSlow(a, b))
   user  system elapsed 
  5.800   0.018   5.820 


My best version was this:

convolveFaster <- function(x,y) {
  foo <- if (length(x)<length(y)) {y %*% t(x)} else { x %*% t(y) }
  foo.d <- dim(foo)
  bar <- matrix(0, sum(foo.d)-1, foo.d[2])
  bar.rc <- row(bar)-col(bar)
  bar[bar.rc>=0 & bar.rc<foo.d[1]]<-foo
  rowSums(bar)
}


This was quite a bit better, but still not nearly as fast as Andrie's

> system.time(convolveFaster(a, b))
   user  system elapsed 
  0.280   0.038   0.319 

                                                                        
                                                        
            
            
              
                
                0
              
                 
                
               讨论(0)
              
              
                                                   
              
                                                            
            
                      
                    


               
            
    发布评论:
    
         
                        
    
    提交评论 
  
  

                    
                    
                    
                        
                        
                         加载中...
                        
                    
                
          
          	          
            
           
            
                              
                
              
              
                
                  粉色の甜心        
                
              
                            
                2021-02-08 11:49
              
            
            
                                                                       
The convolveFast function can be optimized a little by carefully using integer math only and replacing (1:ny)-1L with seq.int(0L, ny-1L):

convolveFaster <- function(x, y) {
    nx <- length(x)
    ny <- length(y)
    xy <- nx + ny - 1L
    xy <- rep(0L, xy)
    for(i in seq_len(nx)){
        j <- seq_len(ny)
        ij <- i + j - 1L
        xy[i+seq.int(0L, ny-1L)] <- xy[ij] + x[i] * y
    }
    xy
}

                                                                        
                                                        
            
            
              
                
                0
              
                 
                
               讨论(0)
              
              
                                                   
              
                                                            
            
                      
                    


               
            
    发布评论:
    
         
                        
    
    提交评论 
  
  

                    
                    
                    
                        
                        
                         加载中...
                        
                    
                
          
          	          
            
           
            
                              
                
              
              
                
                  梦如初夏        
                
              
                            
                2021-02-08 11:58
              
            
            
                                                                       

For vectors, you index with [], not [[]], so use xy[ij] etc 
Convolution doesn't vectorise easily but one common trick is to switch to compiled code. The Writing R Extensions manual uses convolution as a running example and shows several alternative; we also use it a lot in the Rcpp documentation.

                                                                        
                                                        
            
            
              
                
                0
              
                 
                
               讨论(0)
              
              
                                                   
              
                                                            
            
                      
                    


               
            
    发布评论:
    
         
                        
    
    提交评论 
  
  

                    
                    
                    
                        
                        
                         加载中...
                        
                    
                
          
          	          
                             
        
        
          
            
            
              
              
            
    


                                 
              
            
                          
    

        
         
                验证码
                
                  
                
                
                   看不清?
                
              
                                  
                    
   
                 
             
              提交回复