Efficient way to get middle (median) of an std::set?

后端 未结 5 1633
故里飘歌
故里飘歌 2021-01-07 23:52

std::set is a sorted tree. It provides begin and end methods so I can get minimum and maximum and lower_bound and u

5条回答
  •  隐瞒了意图╮
    2021-01-08 00:10

    This suggestion is pure magic and will fail if there are some duplicated items

    Depending on how often you insert/remove items versus look up the middle/median, a possibly more efficient solution than the obvious one is to keep a persistent iterator to the middle element and update it whenever you insert/delete items from the set. There are a bunch of edge cases which will need handling (odd vs even number of items, removing the middle item, empty set, etc.), but the basic idea would be that when you insert an item that's smaller than the current middle item, your middle iterator may need decrementing, whereas if you insert a larger one, you need to increment. It's the other way around for removals.

    Suggestions

    1. first suggestion is to use a std::multiset instead of std::set, so that it can work well when items could be duplicated
    2. my suggestion is to use 2 multisets to track the smaller potion and the bigger potion and balance the size between them

    Algorithm

    1. keep the sets balanced, so that size_of_small==size_of_big or size_of_small + 1 == size_of_big

    void balance(multiset &small, multiset &big)
    {
        while (true)
        {
            int ssmall = small.size();
            int sbig = big.size();
    
            if (ssmall == sbig || ssmall + 1 == sbig) break; // OK
    
            if (ssmall < sbig)
            {
                // big to small
                auto v = big.begin();
                small.emplace(*v);
                big.erase(v);
            }
            else 
            {
                // small to big
                auto v = small.end();
                --v;
                big.emplace(*v);
                small.erase(v);
            }
        }
    }
    

    2. if the sets are balanced, the medium item is always the first item in the big set

    auto medium = big.begin();
    cout << *medium << endl;
    

    3. take caution when add a new item

    auto v = big.begin();
    if (v != big.end() && new_item > *v)
        big.emplace(new_item );
    else
        small.emplace(new_item );
    
    balance(small, big);
    

    complexity explained

    • it is O(1) to find the medium value
    • add a new item takes O(log n)
    • you can still search a item in O(log n), but you need to search 2 sets

提交回复
热议问题