Rolling median algorithm in C

前端 未结 13 1637
臣服心动
臣服心动 2020-11-27 09:24

I am currently working on an algorithm to implement a rolling median filter (analogous to a rolling mean filter) in C. From my search of the literature, there appear to be t

相关标签:
13条回答
  • 2020-11-27 10:05

    Here's a simple algorithm for quantized data (months later):

    """ median1.py: moving median 1d for quantized, e.g. 8-bit data
    
    Method: cache the median, so that wider windows are faster.
        The code is simple -- no heaps, no trees.
    
    Keywords: median filter, moving median, running median, numpy, scipy
    
    See Perreault + Hebert, Median Filtering in Constant Time, 2007,
        http://nomis80.org/ctmf.html: nice 6-page paper and C code,
        mainly for 2d images
    
    Example:
        y = medians( x, window=window, nlevel=nlevel )
        uses:
        med = Median1( nlevel, window, counts=np.bincount( x[0:window] ))
        med.addsub( +, - )  -- see the picture in Perreault
        m = med.median()  -- using cached m, summ
    
    How it works:
        picture nlevel=8, window=3 -- 3 1s in an array of 8 counters:
            counts: . 1 . . 1 . 1 .
            sums:   0 1 1 1 2 2 3 3
                            ^ sums[3] < 2 <= sums[4] <=> median 4
            addsub( 0, 1 )  m, summ stay the same
            addsub( 5, 1 )  slide right
            addsub( 5, 6 )  slide left
    
    Updating `counts` in an `addsub` is trivial, updating `sums` is not.
    But we can cache the previous median `m` and the sum to m `summ`.
    The less often the median changes, the faster;
    so fewer levels or *wider* windows are faster.
    (Like any cache, run time varies a lot, depending on the input.)
    
    See also:
        scipy.signal.medfilt -- runtime roughly ~ window size
        http://stackoverflow.com/questions/1309263/rolling-median-algorithm-in-c
    
    """
    
    from __future__ import division
    import numpy as np  # bincount, pad0
    
    __date__ = "2009-10-27 oct"
    __author_email__ = "denis-bz-py at t-online dot de"
    
    
    #...............................................................................
    class Median1:
        """ moving median 1d for quantized, e.g. 8-bit data """
    
        def __init__( s, nlevel, window, counts ):
            s.nlevel = nlevel  # >= len(counts)
            s.window = window  # == sum(counts)
            s.half = (window // 2) + 1  # odd or even
            s.setcounts( counts )
    
        def median( s ):
            """ step up or down until sum cnt to m-1 < half <= sum to m """
            if s.summ - s.cnt[s.m] < s.half <= s.summ:
                return s.m
            j, sumj = s.m, s.summ
            if sumj <= s.half:
                while j < s.nlevel - 1:
                    j += 1
                    sumj += s.cnt[j]
                    # print "j sumj:", j, sumj
                    if sumj - s.cnt[j] < s.half <= sumj:  break
            else:
                while j > 0:
                    sumj -= s.cnt[j]
                    j -= 1
                    # print "j sumj:", j, sumj
                    if sumj - s.cnt[j] < s.half <= sumj:  break
            s.m, s.summ = j, sumj
            return s.m
    
        def addsub( s, add, sub ):
            s.cnt[add] += 1
            s.cnt[sub] -= 1
            assert s.cnt[sub] >= 0, (add, sub)
            if add <= s.m:
                s.summ += 1
            if sub <= s.m:
                s.summ -= 1
    
        def setcounts( s, counts ):
            assert len(counts) <= s.nlevel, (len(counts), s.nlevel)
            if len(counts) < s.nlevel:
                counts = pad0__( counts, s.nlevel )  # numpy array / list
            sumcounts = sum(counts)
            assert sumcounts == s.window, (sumcounts, s.window)
            s.cnt = counts
            s.slowmedian()
    
        def slowmedian( s ):
            j, sumj = -1, 0
            while sumj < s.half:
                j += 1
                sumj += s.cnt[j]
            s.m, s.summ = j, sumj
    
        def __str__( s ):
            return ("median %d: " % s.m) + \
                "".join([ (" ." if c == 0 else "%2d" % c) for c in s.cnt ])
    
    #...............................................................................
    def medianfilter( x, window, nlevel=256 ):
        """ moving medians, y[j] = median( x[j:j+window] )
            -> a shorter list, len(y) = len(x) - window + 1
        """
        assert len(x) >= window, (len(x), window)
        # np.clip( x, 0, nlevel-1, out=x )
            # cf http://scipy.org/Cookbook/Rebinning
        cnt = np.bincount( x[0:window] )
        med = Median1( nlevel=nlevel, window=window, counts=cnt )
        y = (len(x) - window + 1) * [0]
        y[0] = med.median()
        for j in xrange( len(x) - window ):
            med.addsub( x[j+window], x[j] )
            y[j+1] = med.median()
        return y  # list
        # return np.array( y )
    
    def pad0__( x, tolen ):
        """ pad x with 0 s, numpy array or list """
        n = tolen - len(x)
        if n > 0:
            try:
                x = np.r_[ x, np.zeros( n, dtype=x[0].dtype )]
            except NameError:
                x += n * [0]
        return x
    
    #...............................................................................
    if __name__ == "__main__":
        Len = 10000
        window = 3
        nlevel = 256
        period = 100
    
        np.set_printoptions( 2, threshold=100, edgeitems=10 )
        # print medians( np.arange(3), 3 )
    
        sinwave = (np.sin( 2 * np.pi * np.arange(Len) / period )
            + 1) * (nlevel-1) / 2
        x = np.asarray( sinwave, int )
        print "x:", x
        for window in ( 3, 31, 63, 127, 255 ):
            if window > Len:  continue
            print "medianfilter: Len=%d window=%d nlevel=%d:" % (Len, window, nlevel)
                y = medianfilter( x, window=window, nlevel=nlevel )
            print np.array( y )
    
    # end median1.py
    
    0 讨论(0)
  • 2020-11-27 10:06

    I have looked at R's src/library/stats/src/Trunmed.c a few times as I wanted something similar too in a standalone C++ class / C subroutine. Note that this are actually two implementations in one, see src/library/stats/man/runmed.Rd (the source of the help file) which says

    \details{
      Apart from the end values, the result \code{y = runmed(x, k)} simply has
      \code{y[j] = median(x[(j-k2):(j+k2)])} (k = 2*k2+1), computed very
      efficiently.
    
      The two algorithms are internally entirely different:
      \describe{
        \item{"Turlach"}{is the Härdle-Steiger
          algorithm (see Ref.) as implemented by Berwin Turlach.
          A tree algorithm is used, ensuring performance \eqn{O(n \log
            k)}{O(n * log(k))} where \code{n <- length(x)} which is
          asymptotically optimal.}
        \item{"Stuetzle"}{is the (older) Stuetzle-Friedman implementation
          which makes use of median \emph{updating} when one observation
          enters and one leaves the smoothing window.  While this performs as
          \eqn{O(n \times k)}{O(n * k)} which is slower asymptotically, it is
          considerably faster for small \eqn{k} or \eqn{n}.}
      }
    }
    

    It would be nice to see this re-used in a more standalone fashion. Are you volunteering? I can help with some of the R bits.

    Edit 1: Besides the link to the older version of Trunmed.c above, here are current SVN copies of

    • Srunmed.c (for the Stuetzle version)
    • Trunmed.c (for the Turlach version)
    • runmed.R for the R function calling these

    Edit 2: Ryan Tibshirani has some C and Fortran code on fast median binning which may be a suitable starting point for a windowed approach.

    0 讨论(0)
  • 2020-11-27 10:06

    Rolling median can be found by maintaining two partitions of numbers.

    For maintaining partitions use Min Heap and Max Heap.

    Max Heap will contain numbers smaller than equal to median.

    Min Heap will contain numbers greater than equal to median.

    Balancing Constraint: if total number of elements are even then both heap should have equal elements.

    if total number of elements are odd then Max Heap will have one more element than Min Heap.

    Median Element: If Both partitions has equal number of elements then median will be half of sum of max element from first partition and min element from second partition.

    Otherwise median will be max element from first partition.

    Algorithm-
    1- Take two Heap(1 Min Heap and 1 Max Heap)
       Max Heap will contain first half number of elements
       Min Heap will contain second half number of elements
    
    2- Compare new number from stream with top of Max Heap, 
       if it is smaller or equal add that number in max heap. 
       Otherwise add number in Min Heap.
    
    3- if min Heap has more elements than Max Heap 
       then remove top element of Min Heap and add in Max Heap.
       if max Heap has more than one element than in Min Heap 
       then remove top element of Max Heap and add in Min Heap.
    
    4- If Both heaps has equal number of elements then
       median will be half of sum of max element from Max Heap and min element from Min Heap.
       Otherwise median will be max element from the first partition.
    
    public class Solution {
    
        public static void main(String[] args) {
            Scanner in = new Scanner(System.in);
            RunningMedianHeaps s = new RunningMedianHeaps();
            int n = in.nextInt();
            for(int a_i=0; a_i < n; a_i++){
                printMedian(s,in.nextInt());
            }
            in.close();       
        }
    
        public static void printMedian(RunningMedianHeaps s, int nextNum){
                s.addNumberInHeap(nextNum);
                System.out.printf("%.1f\n",s.getMedian());
        }
    }
    
    class RunningMedianHeaps{
        PriorityQueue<Integer> minHeap = new PriorityQueue<Integer>();
        PriorityQueue<Integer> maxHeap = new PriorityQueue<Integer>(Comparator.reverseOrder());
    
        public double getMedian() {
    
            int size = minHeap.size() + maxHeap.size();     
            if(size % 2 == 0)
                return (maxHeap.peek()+minHeap.peek())/2.0;
            return maxHeap.peek()*1.0;
        }
    
        private void balanceHeaps() {
            if(maxHeap.size() < minHeap.size())
            {
                maxHeap.add(minHeap.poll());
            }   
            else if(maxHeap.size() > 1+minHeap.size())
            {
                minHeap.add(maxHeap.poll());
            }
        }
    
        public void addNumberInHeap(int num) {
            if(maxHeap.size()==0 || num <= maxHeap.peek())
            {
                maxHeap.add(num);
            }
            else
            {
                minHeap.add(num);
            }
            balanceHeaps();
        }
    }
    
    0 讨论(0)
  • 2020-11-27 10:08

    I couldn't find a modern implementation of a c++ data structure with order-statistic so ended up implementing both ideas in top coders link suggested by MAK ( Match Editorial: scroll down to FloatingMedian).

    Two multisets

    The first idea partitions the data into two data structures (heaps, multisets etc) with O(ln N) per insert/delete does not allow the quantile to be changed dynamically without a large cost. I.e. we can have a rolling median, or a rolling 75% but not both at the same time.

    Segment tree

    The second idea uses a segment tree which is O(ln N) for insert/deletions/queries but is more flexible. Best of all the "N" is the size of your data range. So if your rolling median has a window of a million items, but your data varies from 1..65536, then only 16 operations are required per movement of the rolling window of 1 million!!

    The c++ code is similar to what Denis posted above ("Here's a simple algorithm for quantized data")

    GNU Order Statistic Trees

    Just before giving up, I found that stdlibc++ contains order statistic trees!!!

    These have two critical operations:

    iter = tree.find_by_order(value)
    order = tree.order_of_key(value)
    

    See libstdc++ manual policy_based_data_structures_test (search for "split and join").

    I have wrapped the tree for use in a convenience header for compilers supporting c++0x/c++11 style partial typedefs:

    #if !defined(GNU_ORDER_STATISTIC_SET_H)
    #define GNU_ORDER_STATISTIC_SET_H
    #include <ext/pb_ds/assoc_container.hpp>
    #include <ext/pb_ds/tree_policy.hpp>
    
    // A red-black tree table storing ints and their order
    // statistics. Note that since the tree uses
    // tree_order_statistics_node_update as its update policy, then it
    // includes its methods by_order and order_of_key.
    template <typename T>
    using t_order_statistic_set = __gnu_pbds::tree<
                                      T,
                                      __gnu_pbds::null_type,
                                      std::less<T>,
                                      __gnu_pbds::rb_tree_tag,
                                      // This policy updates nodes'  metadata for order statistics.
                                      __gnu_pbds::tree_order_statistics_node_update>;
    
    #endif //GNU_ORDER_STATISTIC_SET_H
    
    0 讨论(0)
  • 2020-11-27 10:08

    If you have the ability to reference values as a function of points in time, you could sample values with replacement, applying bootstrapping to generate a bootstrapped median value within confidence intervals. This may let you calculate an approximated median with greater efficiency than constantly sorting incoming values into a data structure.

    0 讨论(0)
  • 2020-11-27 10:09

    If you just require a smoothed average a quick/easy way is to multiply the latest value by x and the average value by (1-x) then add them. This then becomes the new average.

    edit: Not what the user asked for and not as statistically valid but good enough for a lot of uses.
    I'll leave it in here (in spite of the downvotes) for search!

    0 讨论(0)
提交回复
热议问题