问题

My idea is to calculate several statistics in a moving window (2 by 2). For example, the code below calculate the mean value in a moving window. It works well when the input data hasn't got NA values, however gives bad results (NAs are treated as the lowest int) when NAs are in the dataset. Can you guide me how it can be improved - for example by excluding NA in these calculations?

#include <RcppArmadillo.h>
using namespace Rcpp;

// [[Rcpp::depends(RcppArmadillo)]]

// [[Rcpp::export]]
Rcpp::NumericMatrix get_mw_mean(arma::imat x){
  int num_r = x.n_rows - 1;
  int num_c = x.n_cols - 1;

  arma::dmat result(num_r, num_c);

  for (int i = 0; i < num_r; i++) {
    for (int j = 0; j < num_c; j++) {
      arma::imat sub_x = x.submat(i, j, i + 1, j + 1);
      arma::ivec sub_x_v = vectorise(sub_x);

      arma::vec sub_x_v2 = arma::conv_to<arma::vec>::from(sub_x_v);
      double sub_mean = arma::mean(sub_x_v2);
      result(i, j) = sub_mean;
    }
  }
  return(wrap(result));
}

/*** R
new_c1 = c(1, 86, 98,
           15, 5, 85,
           32, 25, 68)
lg1 = matrix(new_c1, nrow = 3, byrow = TRUE)
get_mw_mean(lg1)
new_c2 = c(NA, 86, 98,
           15, NA, 85,
           32, 25, 68)
lg2 = matrix(new_c2, nrow = 3, byrow = TRUE)
get_mw_mean(lg2)
*/

Cheers, Jot

回答1:

Two things are happening here:

The matrix input type, arma::imat, is a signed int, but NA and NaN are only present in float or double types. In essence, int cannot have a NA or NaN placeholder by design. Thus, the conversion that occurs is to drop to the INT_MIN.
The need to subset out NA or NaN values in C++ for ints.

So, the way forward is to be bale to detect this INT_MIN value and remove it from the matrix. One way to accomplish this is to use find() to identify finite elements that do not match INT_MIN and .elem() to extract the identified elements.

For cases involving double, e.g. arma::mat/arma::vec/ et cetera, consider using find_finite()

Implemented

#include <RcppArmadillo.h>

// [[Rcpp::depends(RcppArmadillo)]]

// [[Rcpp::export]]
arma::mat get_mw_mean_na(arma::imat x){
  int num_r = x.n_rows - 1;
  int num_c = x.n_cols - 1;

  Rcpp::Rcout << x <<std::endl;

  arma::dmat result(num_r, num_c);

  for (int i = 0; i < num_r; i++) {
    for (int j = 0; j < num_c; j++) {
      arma::imat sub_x = x.submat(i, j, i + 1, j + 1);
      // Conversion + Search for NA values
      arma::vec sub_x_v2 = arma::conv_to<arma::vec>::from( 
                                        sub_x.elem( find(sub_x != INT_MIN) ) 
      );

      result(i, j) = arma::mean(sub_x_v2);
    }
  }

  return result;
}

Output

new_c1 = c(1, 86, 98,
           15, 5, 85,
           32, 25, 68)
lg1 = matrix(new_c1, nrow = 3, byrow = TRUE)
get_mw_mean_na(lg1)
#        [,1]  [,2]
# [1,] 26.75 68.50
# [2,] 19.25 45.75

new_c2 = c(NA, 86, 98,
           15, NA, 85,
           32, 25, 68)
lg2 = matrix(new_c2, nrow = 3, byrow = TRUE)
get_mw_mean_na(lg2)
#      [,1]     [,2]
# [1,] 50.5 89.66667
# [2,] 24.0 59.33333

来源：https://stackoverflow.com/questions/49012902/rcpp-removing-nas-in-a-moving-window-calculation

标签

rcpp

armadillo

rcpp: removing NAs in a moving window calculation

问题

回答1:

Implemented

Output