find median in a fixed-size moving window along a long sequence of data

前端 未结 5 1175
挽巷
挽巷 2021-02-04 09:41

Given a sequence of data (it may have duplicates), a fixed-sized moving window, move the window at each iteration from the start of the data sequence, such that (1) the oldes

5条回答
  •  孤街浪徒
    2021-02-04 10:06

    I gave this answer for the "rolling median in C" question

    I couldn't find a modern implementation of a c++ data structure with order-statistic so ended up implementing both ideas in top coders link ( Match Editorial: scroll down to FloatingMedian).

    Two multisets

    The first idea partitions the data into two data structures (heaps, multisets etc) with O(ln N) per insert/delete does not allow the quantile to be changed dynamically without a large cost. I.e. we can have a rolling median, or a rolling 75% but not both at the same time.

    Segment tree

    The second idea uses a segment tree which is O(ln N) for insert/deletions/queries but is more flexible. Best of all the "N" is the size of your data range. So if your rolling median has a window of a million items, but your data varies from 1..65536, then only 16 operations are required per movement of the rolling window of 1 million!! (And you only need 65536 * sizeof(counting_type) bytes, e.g. 65536*4).

    GNU Order Statistic Trees

    Just before giving up, I found that stdlibc++ contains order statistic trees!!!

    These have two critical operations:

    iter = tree.find_by_order(value)
    order = tree.order_of_key(value)
    

    See libstdc++ manual policy_based_data_structures_test (search for "split and join").

    I have wrapped the tree for use in a convenience header for compilers supporting c++0x/c++11 style partial typedefs:

    #if !defined(GNU_ORDER_STATISTIC_SET_H)
    #define GNU_ORDER_STATISTIC_SET_H
    #include 
    #include 
    
    // A red-black tree table storing ints and their order
    // statistics. Note that since the tree uses
    // tree_order_statistics_node_update as its update policy, then it
    // includes its methods by_order and order_of_key.
    template 
    using t_order_statistic_set = __gnu_pbds::tree<
                                      T,
                                      __gnu_pbds::null_type,
                                      std::less,
                                      __gnu_pbds::rb_tree_tag,
                                      // This policy updates nodes'  metadata for order statistics.
                                      __gnu_pbds::tree_order_statistics_node_update>;
    
    #endif //GNU_ORDER_STATISTIC_SET_H
    

提交回复
热议问题