Subset of a Rcpp Matrix that matches a logical statement

前端 未结 2 791
情歌与酒
情歌与酒 2021-02-01 20:29

In R, if we have a data matrix, say a 100 by 10 matrix X, and a 100-elements vector t with possible values (0, 1, 2, 3), we can easily find a submatrix y of X using a simple syn

2条回答
  •  时光取名叫无心
    2021-02-01 21:12

    I would love to see this as sugar. Unfortunately, I am not qualified to implement it though. Here are still a number of different solutions I played with.

    First, I had to make some modifications to Gong-Yi Liao code to get this to work (colvec instead of vec for tIdx and Xmat.rows(... instead of X.rows(...:

    mat Xmat(X.begin(), X.nrow(), X.ncol(), false);
    colvec tIdx(T.begin(), T.size(), false); 
    mat y = Xmat.rows(find(tIdx == 1));
    

    Second, here are three function with benchmarks that all subset matrices based on a logical statement. The functions take arma or rcpp arguments and return values Two are based on Gong-Yi Liao's solution and one is a simple loop-based solution.

    n(rows)=100, p(T==1)=0.3

                    expr   min     lq median     uq    max
    1  submat_arma(X, T) 5.009 5.3955 5.8250 6.2250 28.320
    2 submat_arma2(X, T) 4.859 5.2995 5.6895 6.1685 45.122
    3  submat_rcpp(X, T) 5.831 6.3690 6.7465 7.3825 20.876
    4        X[T == 1, ] 3.411 3.9380 4.1475 4.5345 27.981
    

    n(rows)=10000, p(T==1)=0.3

                    expr     min       lq   median       uq      max
    1  submat_arma(X, T) 107.070 113.4000 125.5455 141.3700 1468.539
    2 submat_arma2(X, T)  76.179  80.4295  88.2890 100.7525 1153.810
    3  submat_rcpp(X, T) 244.242 247.3120 276.6385 309.2710 1934.126
    4        X[T == 1, ] 229.884 236.1445 263.5240 289.2370 1876.980
    

    submat.cpp

    #include 
    // [[Rcpp::depends(RcppArmadillo)]]
    
    using namespace Rcpp;
    using namespace arma;
    
    // arma in; arma out
    // [[Rcpp::export]]
    mat submat_arma(arma::mat X, arma::colvec T) {
        mat y = X.rows(find(T == 1));
        return y;
    }
    
    // rcpp in; arma out
    // [[Rcpp::export]]
    mat submat_arma2(NumericMatrix X, NumericVector T) {
        mat Xmat(X.begin(), X.nrow(), X.ncol(), false);
        colvec tIdx(T.begin(), T.size(), false); 
        mat y = Xmat.rows(find(tIdx == 1));
        return y;
    }
    
    // rcpp in; rcpp out
    // [[Rcpp::export]]
    NumericMatrix submat_rcpp(NumericMatrix X, LogicalVector condition) { 
        int n=X.nrow(), k=X.ncol();
        NumericMatrix out(sum(condition),k);
        for (int i = 0, j = 0; i < n; i++) {
            if(condition[i]) {
                out(j,_) = X(i,_);
                j = j+1;
            }
        }
        return(out);
    }
    
    
    /*** R
    library("microbenchmark")
    
    # simulate data
    n=100
    p=0.3
    T=rbinom(n,1,p)
    X=as.matrix(cbind(rnorm(n),rnorm(n)))
    
    # compare output
    identical(X[T==1,],submat_arma(X,T))
    identical(X[T==1,],submat_arma2(X,T))
    identical(X[T==1,],submat_rcpp(X,T))
    
    # benchmark
    microbenchmark(X[T==1,],submat_arma(X,T),submat_arma2(X,T),submat_rcpp(X,T),times=500)
    
    # increase n
    n=10000
    p=0.3
    T=rbinom(n,1,p)
    X=as.matrix(cbind(rnorm(n),rnorm(n)))
    # benchmark
    microbenchmark(X[T==1,],submat_arma(X,T),submat_arma2(X,T),submat_rcpp(X,T),times=500)
    
    */
    

提交回复
热议问题