Hadoop Yarn Container Does Not Allocate Enough Space

前端未结

关注

 2  1294

I\'m running a Hadoop job, and in my yarn-site.xml file, I have the following configuration:

    
            yarn.scheduler.mini


                      
              相关标签:


      
      
        
          2条回答        

        
                         				            
            
           
            
                              
                
              
              
                
                  一整个雨季        
                
              
                            
                2020-12-30 09:58
              
            
            
                                                                       
You should also properly configure the memory allocations for MapReduce. From this HortonWorks tutorial:


  [...]
  
  For our example cluster, we have the minimum RAM for a Container
  (yarn.scheduler.minimum-allocation-mb) = 2 GB. We’ll thus assign 4 GB
  for Map task Containers, and 8 GB for Reduce tasks Containers.
  
  In mapred-site.xml:
  
  mapreduce.map.memory.mb: 4096
  
  mapreduce.reduce.memory.mb: 8192
  
  Each Container will run JVMs for the Map and Reduce tasks. The JVM
  heap size should be set to lower than the Map and Reduce memory
  defined above, so that they are within the bounds of the Container
  memory allocated by YARN.
  
  In mapred-site.xml:
  
  mapreduce.map.java.opts: -Xmx3072m
  
  mapreduce.reduce.java.opts: -Xmx6144m
  
  The above settings configure the upper limit of the physical RAM that
  Map and Reduce tasks will use. 


Finally, someone in this thread in the Hadoop mailing list had the same problem and in their case, it turned out they had a memory leak in their code.
                                                                        
                                                        
            
            
              
                
                0
              
                 
                
               讨论(0)
              
              
                                                   
              
                                                            
            
                      
                    


               
            
    发布评论:
    
         
                        
    
    提交评论 
  
  

                    
                    
                    
                        
                        
                         加载中...
                        
                    
                
          
          	          
            
           
            
                              
                
              
              
                
                  囚心锁ツ        
                
              
                            
                2020-12-30 10:09
              
            
            
                                                                       
If any of the above configurations didn't help. If the issue is related to mapper memory, couple of things I would like to suggest that needs to be checked are.


Check if combiner is enabled or not? If yes, then it means that reduce logic has to be run on all the records (output of mapper). This happens in memory. Based on your application you need to check if enabling combiner helps or not. Trade off is between the network transfer bytes and time taken/memory/CPU for the reduce logic on 'X' number of records.


If you feel that combiner is not much of value, just disable it.
If you need combiner and 'X' is a huge number (say millions of records) then considering changing your split logic (For default input formats use less block size, normally 1 block size = 1 split) to map less number of records to a single mapper.

Number of records getting processed in a single mapper. Remember that all these records need to be sorted in memory (output of mapper is sorted). Consider setting mapreduce.task.io.sort.mb  (default is 200MB) to a higher value if needed. mapred-configs.xml
If any of the above didn't help, try to run the mapper logic as a standalone application and profile the application using a Profiler (like JProfiler) and see where the memory getting used. This can give you very good insights.

                                                                        
                                                        
            
            
              
                
                0
              
                 
                
               讨论(0)
              
              
                                                   
              
                                                            
            
                      
                    


               
            
    发布评论:
    
         
                        
    
    提交评论 
  
  

                    
                    
                    
                        
                        
                         加载中...
                        
                    
                
          
          	          
                             
        
        
          
            
            
              
              
            
    


                                 
              
            
                          
    

        
         
                验证码
                
                  
                
                
                   看不清?
                
              
                                  
                    
   
                 
             
              提交回复