Using boost multi index like relational DB

前端未结

关注

 1  2022

Here is the situation that I am trying to simulate:

  COL1                 Col2     Col3
CBT.151.5.T.FEED       S1       t1
CBT.151.5.T.FEED       s2       t


                      
              相关标签:


      
      
        
          1条回答        

        
                         				            
            
           
            
                              
                
              
              
                
                  南方客        
                
              
                            
                2020-12-20 06:38
              
            
            
                                                                       
I've simplified your (ridiculously complicated¹) model to:

enum TimePoints { // Lets assume t1 > t2 > t3 > t4
    t1 = 100,
    t2 = 80,
    t3 = 70,
    t4 = 20,
};

using IdType = std::string;
using Symbol = std::string;
using TimeT  = unsigned int;

struct tickerUpdateInfo {
    IdType m_id;
    Symbol m_symbol;
    TimeT  m_last_update_time;

    friend std::ostream& operator<<(std::ostream& os, tickerUpdateInfo const& tui) {
        return os << "T[" << tui.m_id << ",\t" << tui.m_symbol << ",\t" << tui.m_last_update_time << "]";
    }
} static const data[] = {
    { "CBT.151.5.T.FEED", "S1", t1 },
    { "CBT.151.5.T.FEED", "s2", t2 },
    { "CBT.151.5.T.FEED", "s3", t3 },
    { "CBT.151.5.T.FEED", "s4", t4 },
    { "CBT.151.5.T.FEED", "s5", t1 },
    { "CBT.151.8.T.FEED", "s7", t1 },
    { "CBT.151.5.Q.FEED", "s8", t3 },
};


There. We can work with that. You want an index that's primarily time based, yet you can refine for symbol/id later:

typedef bmi::multi_index_container<tickerUpdateInfo,
    bmi::indexed_by<
        bmi::ordered_non_unique<bmi::tag<struct most_active_index>,
            bmi::composite_key<tickerUpdateInfo,
                BOOST_MULTI_INDEX_MEMBER(tickerUpdateInfo, TimeT,  m_last_update_time),
                BOOST_MULTI_INDEX_MEMBER(tickerUpdateInfo, Symbol, m_symbol),
                BOOST_MULTI_INDEX_MEMBER(tickerUpdateInfo, IdType, m_id)
        > > >
    > ticker_update_info_set;


For our implementation, we don't even need to use the secondary key components, we can just write

std::map<Symbol, size_t> activity_histo(ticker_update_info_set const& tuis, TimeT since)
{
    std::map<Symbol, size_t> histo;
    auto const& index = tuis.get<most_active_index>();

    auto lb = index.upper_bound(since); // for greater-than-inclusive use lower_bound
    for (auto& rec : boost::make_iterator_range(lb, index.end()))
        histo[rec.m_symbol]++;

    return histo;
}


See it Live On Coliru.

Now if volumes get large, you could be tempted to optimize a bit using the secondary index component:

std::map<Symbol, size_t> activity_histo_ex(ticker_update_info_set const& tuis, TimeT since)
{
    std::map<Symbol, size_t> histo;
    auto const& index = tuis.get<most_active_index>();

    for (auto lb = index.upper_bound(since), end = tuis.end(); lb != end;) // for greater-than-inclusive use lower_bound
    {
        auto ub = index.upper_bound(boost::make_tuple(lb->m_last_update_time, lb->m_symbol));
        histo[lb->m_symbol] += std::distance(lb, ub);

        lb = ub;
    }

    return histo;
}


I'm not sure this would become the quicker approach (your profiler would know). See it Live On Coliru too.

Rethink the design?

TBH this whole multi index thing is likely to slow you down due to suboptimal insertion times and lack of locality-of-reference when iterating records.

I'd suggest looking at 


a single flat_multimap ordered by update-time
or even a (fixed size) linear ring-buffer order by  time. This would make a lot of sense since you are most likely receiving the events in increasing time order anyways, so you can just keep appending at the end (and wrap around when the history window is full). This all at once removes all need for reallocation (given that you choose an appropriate maximum capacity for the ringbuffer) as well as give you optimal cache prefetch performance traversing the list for stats.


The second approach should really get some merit once you implement the ringbuffer using Boost Lockfree's spsc_queue offering. Why? Because you can host it in shared memory: 

Shared-memory IPC synchronization (lock-free)



¹ the complexity would be warranted iff your code would have been selfcontained. Sadly, it was not (at all). I had to prune it in order to get something to work. This was, obviously, after removing all line numbers :)
                                                                        
                                                        
            
            
              
                
                0
              
                 
                
               讨论(0)
              
              
                                                   
              
                                                            
            
                      
                    


               
            
    发布评论:
    
         
                        
    
    提交评论 
  
  

                    
                    
                    
                        
                        
                         加载中...
                        
                    
                
          
          	          
                             
        
        
          
            
            
              
              
            
    


                                 
              
            
                          
    

        
         
                验证码
                
                  
                
                
                   看不清?
                
              
                                  
                    
   
                 
             
              提交回复