Hadoop streaming with python on Windows

巧了我就是萌 提交于 2019-12-10 19:05:48

问题


I'm using Hortonworks HDP for Windows and have it successfully configured with a master and 2 slaves.

I'm using the following command;

bin\hadoop jar contrib\streaming\hadoop-streaming-1.1.0-SNAPSHOT.jar -files file:///d:/dev/python/mapper.py,file:///d:/dev/python/reducer.py -mapper "python mapper.py" -reducer "python reduce.py" -input /flume/0424/userlog.MDAC-HD1.MDAC.local..20130424.1366789040945 -output /flume/o%1 -cmdenv PYTHONPATH=c:\python27

The mapper runs through fine, but the log reports that the reduce.py file wasn't found. In the exception it looks like the hadoop taskrunner is creating the symlink for the reducer to the mapper.py file.

When I check the job configuration file, I noticed that mapred.cache.files is set to;

hdfs://MDAC-HD1:8020/mapred/staging/administrator/.staging/job_201304251054_0021/files/mapper.py#mapper.py

It looks like although the reduce.py file is being added to the jar file, it's not being included in the configuration correctly and can't be found when the reducer tries to run.

I think my command is correct, I've tried using -file parameters instead but then neither file is found.

Can anyone see or know of an obvious reason?

Please note, this is on Windows.

EDIT- I've just run it locally and it worked, looks like my problem may be with the copying of the files round the cluster.

Still welcome input!


回答1:


Well, thats embarrassing... my first question and I answer it myself.

I found the problem by renaming the hadoop conf file to force default settings which meant the local job tracker.

The job ran properly and it gave me the room to work out what the problem is, looks like communication around the cluster isn't as complete as it need be.




回答2:


When I see your command, it shows "file:///d:/dev/python/reducer.py" for -files option, but you specify the reduce.py for -reducer. Does this cause the problem?? Sorry I am not sure.



来源:https://stackoverflow.com/questions/16217265/hadoop-streaming-with-python-on-windows

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!