R fill vector efficiently

后端 未结 4 645
不思量自难忘°
不思量自难忘° 2021-01-22 20:22

I have a fairly big vector (>500,000 in length). It contains a bunch of NA interspersed with 1 and it is always guaranteed that it begins with 1<

4条回答
  •  伪装坚强ぢ
    2021-01-22 20:35

    The function fun1 can be speeded up considerably by using the compiler package. Using the code provided by Joshua and extending it with the compiler package:

    library(zoo)  # for na.locf
    library(rbenchmark)
    library(compiler)
    
    v1 <- c(1,NA,NA,NA,1,NA,NA,NA,NA,NA,1,NA,NA,1,NA,1,NA,NA,NA,NA,NA,NA,NA,NA,NA,1)
    v2 <- c(10,10,10,9,10,9,9,9,9,9,10,10,10,11,8,12,12,12,12,12,12,12,12,12,12,13)
    
    fun1 <- function(v1,v2) {
        for (i in 2:length(v1)){
            if (!is.na(v1[i-1]) && is.na(v1[i]) && v2[i]==v2[i-1]){
                v1[i]<-1
            }
        }
        v1
    }
    
    fun2 <- function(v1,v2) {
        # create groups in which we need to assess missing values
        d <- cumsum(as.logical(c(0,diff(v2))))
        # for each group, carry the first obs forward
        ave(v1, d, FUN=function(x) na.locf(x, na.rm=FALSE))
    }
    
    fun3 <- cmpfun(fun1)
    
    fun1(v1,v2)
    fun2(v1,v2)
    all.equal(fun1(v1,v2), fun2(v1,v2))
    all.equal(fun1(v1,v2), fun3(v1,v2))
    
    Nrep <- 1000
    
    V1 <- rep(v1, each=Nrep)
    V2 <- rep(v2, each=Nrep)
    all.equal(fun1(V1,V2), fun2(V1,V2))
    all.equal(fun1(V1,V2), fun3(V1,V2))
    
    benchmark(fun1(V1,V2), fun2(V1,V2), fun3(V1,V2))
    

    we get the following result

    benchmark(fun1(V1,V2), fun2(V1,V2), fun3(V1,V2))
              test replications elapsed relative user.self sys.self user.child
    1 fun1(V1, V2)          100  12.252 5.706567    12.190    0.045          0
    2 fun2(V1, V2)          100   2.147 1.000000     2.133    0.013          0
    3 fun3(V1, V2)          100   3.702 1.724266     3.644    0.023          0
    

    So the compiled fun1 is a lot faster than the original fun1 but still slower than fun2.

提交回复
热议问题