HBase Scan Performance

前端未结

关注

 2  1198

I am performing a range scan that is giving me 500k records. If I set scan.setCaching(100000) it took less than one second, but if scan.setCaching(100000)


                      
              相关标签:


      
      
        
          2条回答        

        
                         				            
            
           
            
                              
                
              
              
                
                  野趣味        
                
              
                            
                2020-12-31 12:13
              
            
            
                                                                       
Scan.setCaching is a misnomer. It should really be called something like Scan.setPrefetch. setCaching actually specifies how many rows will be transmitted per RPC to the regionserver. If you use setCaching(1) then every time you call next() you pay the cost of a round-trip to the regionserver. The down side to setting it to a larger number is that you pay for extra memory in the client, and potentially, you are fetching rows that you won't use, for example, if you stop scanning after reaching a certain number of rows or after you've found a specific value.

Scan.setBlockCache means something entirely different like Chandra pointed out. It basically instructs the regionserver to not pull any data from this Scan into the HBase BlockCache which is a separate pool of memory from the MemStore. Note that MemStores are used for writing and BlockCache is used for reading, and these two pieces of memory are completely separate. HBase currently does not use the BlockCache as a write-back cache. You can control the size of the block cache with the hfile.block.cache.size config setting in hbase-site.xml. Similarly you can control the total pool size of the MemStore via the hbase.regionserver.global.memstore.size setting.

You might want to use setBlockCache(false) if you are doing a full table scan, and you don't want to flush your current working set in the block cache. Otherwise, if you are scanning data that is being used frequently, it would probably be better to leave the setBlockCache alone.
                                                                        
                                                        
            
            
              
                
                0
              
                 
                
               讨论(0)
              
              
                                                   
              
                                                            
            
                      
                    


               
            
    发布评论:
    
         
                        
    
    提交评论 
  
  

                    
                    
                    
                        
                        
                         加载中...
                        
                    
                
          
          	          
            
           
            
                              
                
              
              
                
                  [愿得一人]        
                
              
                            
                2020-12-31 12:30
              
            
            
                                                                       
Hbase has 2 types of cache structures - memory store and block cache.
 memory store is implemented as MemStore and the cache you use for reading is block cache.
 When a data block is read from HDFS, it is cached in the BlockCache. Subsequent reads of neighboring data are simply served from the BlockCache. 
 So, when you manually set scan.set Block Cache(false) then , it will stop caching the rows it reads from hdfs.
 scan.set-caching(100000) is a client side optimisation related to scanners. So it will still work unaffected 
                                                                        
                                                        
            
            
              
                
                0
              
                 
                
               讨论(0)
              
              
                                                   
              
                                                            
            
                      
                    


               
            
    发布评论:
    
         
                        
    
    提交评论 
  
  

                    
                    
                    
                        
                        
                         加载中...
                        
                    
                
          
          	          
                             
        
        
          
            
            
              
              
            
    


                                 
              
            
                          
    

        
         
                验证码
                
                  
                
                
                   看不清?
                
              
                                  
                    
   
                 
             
              提交回复