Fastest way to cross-tabulate two massive logical vectors in R

后端 未结 5 879
别那么骄傲
别那么骄傲 2021-02-02 16:08

For two logical vectors, x and y, of length > 1E8, what is the fastest way to calculate the 2x2 cross tabulations?

I suspect the answer is to w

5条回答
  •  佛祖请我去吃肉
    2021-02-02 16:28

    Here's an answer with Rcpp, tabulating only those entries that are not both 0. I suspect there must be several ways to improve this, as this is unusually slow; it's also my first attempt with Rcpp, so there may be some obvious inefficiencies associated with moving the data around. I wrote an example that is purposefully plain vanilla, which should let others demonstrate how this can be improved.

    library(Rcpp)
    library(inline)
    doCrossTab <- cxxfunction(signature(x="integer", y = "integer"), body='
      Rcpp::IntegerVector Vx(x);
      Rcpp::IntegerVector Vy(y);
      Rcpp::IntegerVector V(3);
      for(int i = 0; i < Vx.length(); i++) {
        if( (Vx(i) == 1) & ( Vy(i) == 1) ){ V[0]++; } 
        else if( (Vx(i) == 1) & ( Vy(i) == 0) ){ V[1]++; } 
        else if( (Vx(i) == 0) & ( Vy(i) == 1) ){ V[2]++; } 
     }
      return( wrap(V));
      ', plugin="Rcpp")
    

    Timing results for N = 3E8:

       user  system elapsed 
     10.930   1.620  12.586 
    

    This takes more than 6X as long as func_find01B in my 2nd answer.

提交回复
热议问题