The reduce fails due to Task attempt failed to report status for 600 seconds. Killing! Solution?

前端 未结 2 739
伪装坚强ぢ
伪装坚强ぢ 2021-02-06 11:10

The reduce phase of the job fails with:

of failed Reduce Tasks exceeded allowed limit.

The reason why each task fails is:

Task attempt_201301251556_163

相关标签:
2条回答
  • 2021-02-06 11:14

    The reason for the timeouts might be a long-running computation in your reducer without reporting the progress back to the Hadoop framework. This can be resolved using different approaches:

    I. Increasing the timeout in mapred-site.xml:

    <property>
      <name>mapred.task.timeout</name>
      <value>1200000</value>
    </property>
    

    The default is 600000 ms = 600 seconds.

    II. Reporting progress every x records as in the Reducer example in javadoc:

    public void reduce(K key, Iterator<V> values,
                              OutputCollector<K, V> output, 
                              Reporter reporter) throws IOException {
       // report progress
       if ((noValues%10) == 0) {
         reporter.progress();
       }
    
       // ...
    }
    

    optionally you can increment a custom counter as in the example:

    reporter.incrCounter(NUM_RECORDS, 1);
    
    0 讨论(0)
  • 2021-02-06 11:27

    It's possible that you might have consumed all of Java's heap space or GC is happening too frequently giving no chance to the reducer to report status to master and is hence killed.

    Another possibility is that one of the reducer is getting too skewed data, i.e. for a particular rid, a lot of records are there.

    Try to increase your java heap by setting the following config: mapred.child.java.opts

    to

    -Xmx2048m

    Also, try and reduce the number of parallel reducers by setting the following config to a lower value than what it currently has (default value is 2):

    mapred.tasktracker.reduce.tasks.maximum

    0 讨论(0)
提交回复
热议问题