How to specify mapred configurations & java options with custom jar in CLI using Amazon's EMR?

后端未结

关注

 2  2009

终归单人心 2021-02-06 08:42

I would like to know how to specify mapreduce configurations such as mapred.task.timeout , mapred.min.split.size etc. , when running a streaming job using custo

2条回答

迷失自我 (楼主)

2021-02-06 09:18
I believe if you want to set these on a per-job basis, then you need to

A) for custom Jars, pass them into your jar as arguments, and process them yourself. I believe this can be automated as follows:
```
public static void main(String[] args) throws Exception {
  Configuration conf = new Configuration();
  args = new GenericOptionsParser(conf, args).getRemainingArgs();
  //....
}
```
Then create the job in this manner (haven't verified if works though):
```
 > elastic-mapreduce --jar s3://mybucket/mycode.jar \
    --args "-D,mapred.reduce.tasks=0"
    --arg s3://mybucket/input \
    --arg s3://mybucket/output
```
The GenericOptionsParser should automatically transfer the -D and -jobconf parameters into Hadoop's job setup. More details: http://hadoop.apache.org/docs/r0.20.0/api/org/apache/hadoop/util/GenericOptionsParser.html

B) for the hadoop streaming jar, you also just pass the configuration change to the command
```
> elastic-mapreduce --jobflow j-ABABABABA \
   --stream --jobconf mapred.task.timeout=600000 \
   --mapper s3://mybucket/mymapper.sh \
   --reducer s3://mybucket/myreducer.sh \
   --input s3://mybucket/input \
   --output s3://mybucket/output \
   --jobconf mapred.reduce.tasks=0
```
More details: https://forums.aws.amazon.com/thread.jspa?threadID=43872 and elastic-mapreduce --help
0 讨论(0)

查看其它2个回答
发布评论:

提交评论
- 加载中...