CouchDB: Return Newest Documents of Type Based on Timestamp

前端未结

关注

 3  435

I have a system that accepts status updates from a variety of unique sources, and each status update creates a new document in the following structure:

{
 \"


                      
              相关标签:


      
      
        
          3条回答        

        
                         				            
            
           
            
                              
                
              
              
                
                  半阙折子戏        
                
              
                            
                2021-01-15 15:07
              
            
            
                                                                       
You can get the latest timestamp for every source using the _stats built-in reduce function, then do another query to get the documents. Here's the views:

"views": {
  "latest_update": {
    "map": "function(doc) { if (doc.type == 'status_update') emit(doc.source_id, doc.timestamp); }",
    "reduce": "_stats"
  },
  "status_update": {
    "map": "function(doc) { if (doc.type == 'status_update') emit([doc.source_id, doc.timestamp], 1); }"
  }
}


First query latest_update with group=true, then status_update with something like (properly url-encoded):

keys=[["truck123",TS123],["truck234",TS234],...]&include_docs=true


where TS123, and TS234 are the values of max returned by latest_update.
                                                                        
                                                        
            
            
              
                
                0
              
                 
                
               讨论(0)
              
              
                                                   
              
                                                            
            
                      
                    


               
            
    发布评论:
    
         
                        
    
    提交评论 
  
  

                    
                    
                    
                        
                        
                         加载中...
                        
                    
                
          
          	          
            
           
            
                              
                
              
              
                
                  小鲜肉        
                
              
                            
                2021-01-15 15:21
              
            
            
                                                                       
CouchDB map/reduce is incremental which basically means the results are always cached, so subsequent requests for the same view (even with different search parameters) run "for free" (or in logarithmic time).

However, that is not strictly true with reduce groups. Sometimes partial results must be re-reduced on the fly. Maybe that is what you are hitting.

Instead, how about a map view (i.e. no reduce function) that emits rows like this, with an array as the key:

// Row diagram (pseudo-code, just to show the concept).
// Key                    , Value
// [source_id, timestamp] , null // value is not very important in this example
["truck1231", 13023123123], null
["truck1231", 13023126723], null
["truck5555", 13023126123], null
["truck6666", 13023000000], null


Notice how all timestamps for a source "clump" together. (Actually, they collate.) To find the latest timestamp for "truck1231", just requests the last row in that "clump". To do that, do a descending query, from the end, with a limit=1 argument. To specify the "end", use the {} "high key" value as the second element in the key (see the collation link for details).

?descending=true&limit=1&startkey=["truck1231",{}]


(Actually, since your timestamps are integers, you could emit their negation, e.g. -13023123123. That will simplify your query a bit but—I don't know—that seems like playing with fire to me.)

To produce these kinds of rows, us a map function like this:

function(doc) {
  // Emit rows sorted first by source id, and second by timestamp
  if (doc.type == "status_update" && doc.timestamp) {
    emit([doc.source_id, doc.timestamp], null) // Using `doc` as the value would be fine too
  }
}

                                                                        
                                                        
            
            
              
                
                0
              
                 
                
               讨论(0)
              
              
                                                   
              
                                                            
            
                      
                    


               
            
    发布评论:
    
         
                        
    
    提交评论 
  
  

                    
                    
                    
                        
                        
                         加载中...
                        
                    
                
          
          	          
            
           
            
                              
                
              
              
                
                  天涯浪人        
                
              
                            
                2021-01-15 15:23
              
            
            
                                                                       
I suspect that it's slow only because you emit the entire document, which means a lot of data needs to be stored and moved around to compute your final values. Try emitting the timestamp instead: 

function(doc) {
  if (doc.type == "status_update") {
    emit(doc.source_id, [doc._id,doc.timestamp]);
  }
}

function(keys, values, rereduce) {
  var winner = values[0];
  var i = values.length;
  while (i--) {
    var val = values[i];
    if (val[1] > winner[1]) winner = val;
  }
  return winner;
}


This should get you an [id,timestamp] pair for every key without being too slow or having to store too much data in the views. 

Once you have a list of identifiers on the client, send a second request using the bulk GET API: 

_all_docs?keys=[id1,id2,id3,...,idn]&include_docs=true 


This will grab all the documents in one request. 
                                                                        
                                                        
            
            
              
                
                0
              
                 
                
               讨论(0)
              
              
                                                   
              
                                                            
            
                      
                    


               
            
    发布评论:
    
         
                        
    
    提交评论 
  
  

                    
                    
                    
                        
                        
                         加载中...
                        
                    
                
          
          	          
                             
        
        
          
            
            
              
              
            
    


                                 
              
            
                          
    

        
         
                验证码
                
                  
                
                
                   看不清?
                
              
                                  
                    
   
                 
             
              提交回复