问题
My idea is to calculate several statistics in a moving window (2 by 2). For example, the code below calculate the mean value in a moving window. It works well when the input data hasn't got NA values, however gives bad results (NAs are treated as the lowest int) when NAs are in the dataset. Can you guide me how it can be improved - for example by excluding NA in these calculations?
#include <RcppArmadillo.h>
using namespace Rcpp;
// [[Rcpp::depends(RcppArmadillo)]]
// [[Rcpp::export]]
Rcpp::NumericMatrix get_mw_mean(arma::imat x){
int num_r = x.n_rows - 1;
int num_c = x.n_cols - 1;
arma::dmat result(num_r, num_c);
for (int i = 0; i < num_r; i++) {
for (int j = 0; j < num_c; j++) {
arma::imat sub_x = x.submat(i, j, i + 1, j + 1);
arma::ivec sub_x_v = vectorise(sub_x);
arma::vec sub_x_v2 = arma::conv_to<arma::vec>::from(sub_x_v);
double sub_mean = arma::mean(sub_x_v2);
result(i, j) = sub_mean;
}
}
return(wrap(result));
}
/*** R
new_c1 = c(1, 86, 98,
15, 5, 85,
32, 25, 68)
lg1 = matrix(new_c1, nrow = 3, byrow = TRUE)
get_mw_mean(lg1)
new_c2 = c(NA, 86, 98,
15, NA, 85,
32, 25, 68)
lg2 = matrix(new_c2, nrow = 3, byrow = TRUE)
get_mw_mean(lg2)
*/
Cheers, Jot
回答1:
Two things are happening here:
The matrix input type,
arma::imat
, is a signedint
, butNA
andNaN
are only present infloat
ordouble
types. In essence,int
cannot have aNA
orNaN
placeholder by design. Thus, the conversion that occurs is to drop to theINT_MIN
.The need to subset out
NA
orNaN
values in C++ forint
s.
So, the way forward is to be bale to detect this INT_MIN
value and remove it from the matrix. One way to accomplish this is to use find() to identify finite elements that do not match INT_MIN
and .elem() to extract the identified elements.
For cases involving double
, e.g. arma::mat
/arma::vec
/ et cetera, consider using find_finite()
Implemented
#include <RcppArmadillo.h>
// [[Rcpp::depends(RcppArmadillo)]]
// [[Rcpp::export]]
arma::mat get_mw_mean_na(arma::imat x){
int num_r = x.n_rows - 1;
int num_c = x.n_cols - 1;
Rcpp::Rcout << x <<std::endl;
arma::dmat result(num_r, num_c);
for (int i = 0; i < num_r; i++) {
for (int j = 0; j < num_c; j++) {
arma::imat sub_x = x.submat(i, j, i + 1, j + 1);
// Conversion + Search for NA values
arma::vec sub_x_v2 = arma::conv_to<arma::vec>::from(
sub_x.elem( find(sub_x != INT_MIN) )
);
result(i, j) = arma::mean(sub_x_v2);
}
}
return result;
}
Output
new_c1 = c(1, 86, 98,
15, 5, 85,
32, 25, 68)
lg1 = matrix(new_c1, nrow = 3, byrow = TRUE)
get_mw_mean_na(lg1)
# [,1] [,2]
# [1,] 26.75 68.50
# [2,] 19.25 45.75
new_c2 = c(NA, 86, 98,
15, NA, 85,
32, 25, 68)
lg2 = matrix(new_c2, nrow = 3, byrow = TRUE)
get_mw_mean_na(lg2)
# [,1] [,2]
# [1,] 50.5 89.66667
# [2,] 24.0 59.33333
来源:https://stackoverflow.com/questions/49012902/rcpp-removing-nas-in-a-moving-window-calculation