Hadoop Yarn Container Does Not Allocate Enough Space

前端 未结 2 1294
说谎
说谎 2020-12-30 09:27

I\'m running a Hadoop job, and in my yarn-site.xml file, I have the following configuration:

    
            yarn.scheduler.mini         


        
相关标签:
2条回答
  • 2020-12-30 09:58

    You should also properly configure the memory allocations for MapReduce. From this HortonWorks tutorial:

    [...]

    For our example cluster, we have the minimum RAM for a Container (yarn.scheduler.minimum-allocation-mb) = 2 GB. We’ll thus assign 4 GB for Map task Containers, and 8 GB for Reduce tasks Containers.

    In mapred-site.xml:

    mapreduce.map.memory.mb: 4096

    mapreduce.reduce.memory.mb: 8192

    Each Container will run JVMs for the Map and Reduce tasks. The JVM heap size should be set to lower than the Map and Reduce memory defined above, so that they are within the bounds of the Container memory allocated by YARN.

    In mapred-site.xml:

    mapreduce.map.java.opts: -Xmx3072m

    mapreduce.reduce.java.opts: -Xmx6144m

    The above settings configure the upper limit of the physical RAM that Map and Reduce tasks will use.

    Finally, someone in this thread in the Hadoop mailing list had the same problem and in their case, it turned out they had a memory leak in their code.

    0 讨论(0)
  • 2020-12-30 10:09

    If any of the above configurations didn't help. If the issue is related to mapper memory, couple of things I would like to suggest that needs to be checked are.

    • Check if combiner is enabled or not? If yes, then it means that reduce logic has to be run on all the records (output of mapper). This happens in memory. Based on your application you need to check if enabling combiner helps or not. Trade off is between the network transfer bytes and time taken/memory/CPU for the reduce logic on 'X' number of records.
      • If you feel that combiner is not much of value, just disable it.
      • If you need combiner and 'X' is a huge number (say millions of records) then considering changing your split logic (For default input formats use less block size, normally 1 block size = 1 split) to map less number of records to a single mapper.
    • Number of records getting processed in a single mapper. Remember that all these records need to be sorted in memory (output of mapper is sorted). Consider setting mapreduce.task.io.sort.mb (default is 200MB) to a higher value if needed. mapred-configs.xml
    • If any of the above didn't help, try to run the mapper logic as a standalone application and profile the application using a Profiler (like JProfiler) and see where the memory getting used. This can give you very good insights.
    0 讨论(0)
提交回复
热议问题