The reduce fails due to Task attempt failed to report status for 600 seconds. Killing! Solution?

前端未结

关注

 2  749

伪装坚强ぢ

The reduce phase of the job fails with:

of failed Reduce Tasks exceeded allowed limit.

The reason why each task fails is:

Task attempt_201301251556_163

相关标签:

2条回答

-上瘾入骨i

2021-02-06 11:14
The reason for the timeouts might be a long-running computation in your reducer without reporting the progress back to the Hadoop framework. This can be resolved using different approaches:

I. Increasing the timeout in mapred-site.xml:
```
<property>
  <name>mapred.task.timeout</name>
  <value>1200000</value>
</property>
```
The default is 600000 ms = 600 seconds.

II. Reporting progress every x records as in the Reducer example in javadoc:
```
public void reduce(K key, Iterator<V> values,
                          OutputCollector<K, V> output, 
                          Reporter reporter) throws IOException {
   // report progress
   if ((noValues%10) == 0) {
     reporter.progress();
   }

   // ...
}
```
optionally you can increment a custom counter as in the example:
```
reporter.incrCounter(NUM_RECORDS, 1);
```
0 讨论(0)
发布评论:

提交评论
- 加载中...
滥情空心

2021-02-06 11:27

It's possible that you might have consumed all of Java's heap space or GC is happening too frequently giving no chance to the reducer to report status to master and is hence killed.

Another possibility is that one of the reducer is getting too skewed data, i.e. for a particular rid, a lot of records are there.

Try to increase your java heap by setting the following config: mapred.child.java.opts

to

-Xmx2048m

Also, try and reduce the number of parallel reducers by setting the following config to a lower value than what it currently has (default value is 2):

mapred.tasktracker.reduce.tasks.maximum

0 讨论(0)
发布评论:

提交评论
- 加载中...