Why is standard R median function so much slower than a simple C++ alternative?

前端 未结 3 1765
鱼传尺愫
鱼传尺愫 2021-02-06 15:54

I made the following implementation of the median in C++ and and used it in R via Rcpp:

// [[Rcpp::export]]
double median2         


        
3条回答
  •  遥遥无期
    2021-02-06 16:33

    [This is more of an extended comment than an answer to the question you actually asked.]

    Even your code may be open to significant improvement. In particular, you're sorting the entire input even though you only care about one or two elements.

    You can change this from O(n log n) to O(n) by using std::nth_element instead of std::sort. In case of an even number of elements, you'd typically want to use std::nth_element to find the element just before the middle, then use std::min_element to find the immediately succeeding element--but std::nth_element also partitions the input items, so the std::min_element only has to run on the items above the middle after the nth_element, not the entire input array. That is, after nth_element, you get a situation like this:

    The complexity of std::nth_element is "linear on average", and (of course) std::min_element is linear as well, so the overall complexity is linear.

    So, for the simple case (odd number of elements), you get something like:

    auto pos = x.begin() + x.size()/2;
    
    std::nth_element(x.begin(), pos, x.end());
    return *pos;
    

    ...and for the more complex case (even number of elements):

    std::nth_element(x.begin(), pos, x.end());
    auto pos2 = std::min_element(pos+1, x.end());
    return (*pos + *pos2) / 2.0;
    

提交回复
热议问题