The reduce phase of the job fails with:
The reason why each task fails is:
Task attempt_201301251556_163
The reason for the timeouts might be a long-running computation in your reducer without reporting the progress back to the Hadoop framework. This can be resolved using different approaches:
I. Increasing the timeout in mapred-site.xml
:
<property>
<name>mapred.task.timeout</name>
<value>1200000</value>
</property>
The default is 600000 ms = 600 seconds
.
II. Reporting progress every x records as in the Reducer example in javadoc:
public void reduce(K key, Iterator<V> values,
OutputCollector<K, V> output,
Reporter reporter) throws IOException {
// report progress
if ((noValues%10) == 0) {
reporter.progress();
}
// ...
}
optionally you can increment a custom counter as in the example:
reporter.incrCounter(NUM_RECORDS, 1);
It's possible that you might have consumed all of Java's heap space or GC is happening too frequently giving no chance to the reducer to report status to master and is hence killed.
Another possibility is that one of the reducer is getting too skewed data, i.e. for a particular rid, a lot of records are there.
Try to increase your java heap by setting the following config:
mapred.child.java.opts
to
-Xmx2048m
Also, try and reduce the number of parallel reducers by setting the following config to a lower value than what it currently has (default value is 2
):
mapred.tasktracker.reduce.tasks.maximum