Using boost multi index like relational DB

前端 未结 1 2022
伪装坚强ぢ
伪装坚强ぢ 2020-12-20 06:17

Here is the situation that I am trying to simulate:

  COL1                 Col2     Col3
CBT.151.5.T.FEED       S1       t1
CBT.151.5.T.FEED       s2       t         


        
相关标签:
1条回答
  • 2020-12-20 06:38

    I've simplified your (ridiculously complicated¹) model to:

    enum TimePoints { // Lets assume t1 > t2 > t3 > t4
        t1 = 100,
        t2 = 80,
        t3 = 70,
        t4 = 20,
    };
    
    using IdType = std::string;
    using Symbol = std::string;
    using TimeT  = unsigned int;
    
    struct tickerUpdateInfo {
        IdType m_id;
        Symbol m_symbol;
        TimeT  m_last_update_time;
    
        friend std::ostream& operator<<(std::ostream& os, tickerUpdateInfo const& tui) {
            return os << "T[" << tui.m_id << ",\t" << tui.m_symbol << ",\t" << tui.m_last_update_time << "]";
        }
    } static const data[] = {
        { "CBT.151.5.T.FEED", "S1", t1 },
        { "CBT.151.5.T.FEED", "s2", t2 },
        { "CBT.151.5.T.FEED", "s3", t3 },
        { "CBT.151.5.T.FEED", "s4", t4 },
        { "CBT.151.5.T.FEED", "s5", t1 },
        { "CBT.151.8.T.FEED", "s7", t1 },
        { "CBT.151.5.Q.FEED", "s8", t3 },
    };
    

    There. We can work with that. You want an index that's primarily time based, yet you can refine for symbol/id later:

    typedef bmi::multi_index_container<tickerUpdateInfo,
        bmi::indexed_by<
            bmi::ordered_non_unique<bmi::tag<struct most_active_index>,
                bmi::composite_key<tickerUpdateInfo,
                    BOOST_MULTI_INDEX_MEMBER(tickerUpdateInfo, TimeT,  m_last_update_time),
                    BOOST_MULTI_INDEX_MEMBER(tickerUpdateInfo, Symbol, m_symbol),
                    BOOST_MULTI_INDEX_MEMBER(tickerUpdateInfo, IdType, m_id)
            > > >
        > ticker_update_info_set;
    

    For our implementation, we don't even need to use the secondary key components, we can just write

    std::map<Symbol, size_t> activity_histo(ticker_update_info_set const& tuis, TimeT since)
    {
        std::map<Symbol, size_t> histo;
        auto const& index = tuis.get<most_active_index>();
    
        auto lb = index.upper_bound(since); // for greater-than-inclusive use lower_bound
        for (auto& rec : boost::make_iterator_range(lb, index.end()))
            histo[rec.m_symbol]++;
    
        return histo;
    }
    

    See it Live On Coliru.

    Now if volumes get large, you could be tempted to optimize a bit using the secondary index component:

    std::map<Symbol, size_t> activity_histo_ex(ticker_update_info_set const& tuis, TimeT since)
    {
        std::map<Symbol, size_t> histo;
        auto const& index = tuis.get<most_active_index>();
    
        for (auto lb = index.upper_bound(since), end = tuis.end(); lb != end;) // for greater-than-inclusive use lower_bound
        {
            auto ub = index.upper_bound(boost::make_tuple(lb->m_last_update_time, lb->m_symbol));
            histo[lb->m_symbol] += std::distance(lb, ub);
    
            lb = ub;
        }
    
        return histo;
    }
    

    I'm not sure this would become the quicker approach (your profiler would know). See it Live On Coliru too.

    Rethink the design?

    TBH this whole multi index thing is likely to slow you down due to suboptimal insertion times and lack of locality-of-reference when iterating records.

    I'd suggest looking at

    • a single flat_multimap ordered by update-time
    • or even a (fixed size) linear ring-buffer order by time. This would make a lot of sense since you are most likely receiving the events in increasing time order anyways, so you can just keep appending at the end (and wrap around when the history window is full). This all at once removes all need for reallocation (given that you choose an appropriate maximum capacity for the ringbuffer) as well as give you optimal cache prefetch performance traversing the list for stats.

    The second approach should really get some merit once you implement the ringbuffer using Boost Lockfree's spsc_queue offering. Why? Because you can host it in shared memory:

    Shared-memory IPC synchronization (lock-free)


    ¹ the complexity would be warranted iff your code would have been selfcontained. Sadly, it was not (at all). I had to prune it in order to get something to work. This was, obviously, after removing all line numbers :)

    0 讨论(0)
提交回复
热议问题