Hadoop cluster - Do I need to replicate my code over all machines before running job?

前端 未结 1 1874
醉话见心
醉话见心 2020-12-21 09:14

This is what confuses me, when I use wordcount example, I keep code at master and let him do things with slaves and it runs fine

But when I am running my code, it

相关标签:
1条回答
  • 2020-12-21 10:15

    With Hadoop Streaming, the code/dependencies have to be copied with the -file flag, if the code is not there on the target machine. Make sure that the map/reduce files and their dependencies are specified in the Hadoop streaming command.

    $HADOOP_HOME/bin/hadoop  jar $HADOOP_HOME/hadoop-streaming.jar \
        -input myInputDirs \
        -output myOutputDir \
        -mapper myPythonScript.py \
        -reducer /bin/wc \
        -file myPythonScript.py \
        -file myDictionary.txt \
    
    0 讨论(0)
提交回复
热议问题