Hadoop Streaming Command Failure with Python Error

前端未结

关注

 3  632

予麋鹿 2021-01-14 09:49

I\'m a newcomer to Ubuntu, Hadoop and DFS but I\'ve managed to install a single-node hadoop instance on my local ubuntu machine following the directions posted on Michael-No

3条回答

-上瘾入骨i (楼主)

2021-01-14 10:37
simliar to errors I was getting --

First, in : -file mapper.py -file reducer.py -mapper mapper.py -reducer reducer.py

you can use local system fully qualified paths on the '-file', and then relative on the '-mapper', eg.: -file /aFully/qualified/localSystemPathTo/yourMapper.py -mapper yourMapper.py

then: remember to include that "#!/usr/bin/python" at the top of the files 'reducer.py' and 'mapper.py'

finally,

in my mapper.py and reducer.py, I put all my imports within a 'setup_call()' function (vs. at the file's 'global' level), and then wrapped that with:
```
if __name__== '__main__':

    try:
        setup_call_andCloseOut()
    except: 
        import sys, traceback, StringIO

        fakeeWriteable = StringIO.StringIO()

        traceback.print_exc(None,  file=fakeeWriteable)
        msg = ""
        msg +="------------------------------------------------------\n"
        msg +="----theTraceback: -----------\n"
        msg += fakeeWriteable.getvalue() +  "\n"
        msg +="------------------------------------------------------\n"

        sys.stderr.write(msg)  

    #end
```
at that point, I was able to use the hadoop web job log (those http:// links in your error message), and navigate my way to seeing the 'stderr' messages.. ( from the actual core logic)

I'm sure there are other more concise ways to do all this, but this was both semantically clear and sufficient for my immediate needs

good luck..
0 讨论(0)

查看其它3个回答
发布评论:

提交评论
- 加载中...