Hadoop Streaming: Mapper 'wrapping' a binary executable

前端未结

关注

 2  1376

I have a pipeline that I currently run on a large university computer cluster. For publication purposes I\'d like to convert it into mapreduce format such that it could be

相关标签:

2条回答

情深已故

2020-12-25 09:16

After much googling (etc.) I figured out how to include executable binaries/scripts/modules that are accessible to your mappers/reducers. The trick is to upload all you files to hadoop first.

$ bin/hadoop dfs -copyFromLocal /local/file/system/module.py module.py

Then you need to format you streaming command like the following template:

$ ./bin/hadoop jar /local/file/system/hadoop-0.21.0/mapred/contrib/streaming/hadoop-0.21.0-streaming.jar \
-file /local/file/system/data/data.txt \
-file /local/file/system/mapper.py \
-file /local/file/system/reducer.py \
-cacheFile hdfs://localhost:9000/user/you/module.py#module.py \
-input data.txt \
-output output/ \
-mapper mapper.py \
-reducer reducer.py \
-verbose

If you're linking a python module you'll need to add the following code to your mapper/reducer scripts:

import sys 
sys.path.append('.')
import module

If you're accessing a binary via subprocessing your command should look something like this:

cli = "./binary %s" % (argument)
cli_parts = shlex.split(cli)
mp = Popen(cli_parts, stdin=PIPE, stderr=PIPE, stdout=PIPE)
mp.communicate()[0]

Hope this helps.

0 讨论(0)

别跟我提以往

2020-12-25 09:38
Got it running finally
```
$pid = open2 (my $out, my $in, "./binary") or die "could not run open2";
```
0 讨论(0)
发布评论:

提交评论
- 加载中...