Map Reduce: ChainMapper and ChainReducer

前端未结

关注

 2  1711

I need to split my Map Reduce jar file in two jobs in order to get two different output file, one from each reducers of the two jobs.

I mean that the first job has to pr

相关标签:

2条回答

离开以前

2021-02-02 05:02
There are many ways you can do it.
1. Cascading jobs
  
  Create the JobConf object "job1" for the first job and set all the parameters with "input" as inputdirectory and "temp" as output directory. Execute this job: JobClient.run(job1).
  
  Immediately below it, create the JobConf object "job2" for the second job and set all the parameters with "temp" as inputdirectory and "output" as output directory. Execute this job: JobClient.run(job2).
2. Two JobConf objects
  
  Create two JobConf objects and set all the parameters in them just like (1) except that you don't use JobClient.run.
  
  Then create two Job objects with jobconfs as parameters:
  
  Job job1=new Job(jobconf1); Job job2=new Job(jobconf2);
  
  Using the jobControl object, you specify the job dependencies and then run the jobs:
```
JobControl jbcntrl=new JobControl("jbcntrl");
jbcntrl.addJob(job1);
jbcntrl.addJob(job2);
job2.addDependingJob(job1);
jbcntrl.run();
```
3. ChainMapper and ChainReducer
  
  If you need a structure somewhat like Map+ | Reduce | Map*, you can use the ChainMapper and ChainReducer classes that come with Hadoop version 0.19 and onwards. Note that in this case, you can use only one reducer but any number of mappers before or after it.
0 讨论(0)
发布评论:

提交评论
- 加载中...
隐瞒了意图╮

2021-02-02 05:15

I think the above solution involves disk I/O operation, thus will slow down with large datasets.Alternative is to use Oozie or Cascading.

0 讨论(0)
发布评论:

提交评论
- 加载中...