Running a job using hadoop streaming and mrjob: PipeMapRed.waitOutputThreads(): subprocess failed with code 1

前端 未结 4 1596
隐瞒了意图╮
隐瞒了意图╮ 2020-12-06 02:35

Hey I\'m fairly new to the world of Big Data. I came across this tutorial on http://musicmachinery.com/2011/09/04/how-to-process-a-million-songs-in-20-minutes/

It d

相关标签:
4条回答
  • 2020-12-06 02:52

    I faced the same problem when running , my mapper and reducer scripts were not executable.

    Adding #! /usr/bin/python at the top of my files fixed the issue.

    0 讨论(0)
  • 2020-12-06 02:55

    Another reason, such as you have an error in your shell script to run the mapper.py and reducer.py. Here is my suggestions:

    Firstly you should try to run you mapper.py and reducer.py in the local environment.

    Next you could try to track your mapreduce job on your url printed in the stdout log, like this:16:01:56 INFO mapreduce.Job: The url to track the job: http://xxxxxx:8088/proxy/application_xxx/" which has detailed error information. Hope this help!

    0 讨论(0)
  • 2020-12-06 03:04

    Error code 1 is a generic error for Hadoop Streaming. You can get this error code for two main reasons:

    • Your Mapper and Reducer scripts are not executable (include the #!/usr/bin/python at the beginning of the script).

    • Your Python program is simply written wrong - you could have a syntax error or logical bug.

    Unfortunately, error code 1 does not give you any details to see exactly what is wrong with your Python program.

    I was stuck with error code 1 for a while myself, and the way I figured it out was to simply run my Mapper script as a standalone python program: python mapper.py

    After doing this, I got a regular Python error that told me I was simply giving a function the wrong type of argument. I fixed my syntax error, and everything worked after that. So if possible, I'd run your Mapper or Reducer script as a standalone Python program to see if that gives you any insight on the reasoning for your error.

    0 讨论(0)
  • 2020-12-06 03:11

    I got the same error, sub-process failed with code 1

    [cloudera@quickstart ~]$ hadoop jar /usr/lib/hadoop-mapreduce/hadoop-streaming.jar -input /user/cloudera/input -output /user/cloudera/output_join -mapper /home/cloudera/join1_mapper.py -reducer /home/cloudera/join1_reducer.py
    
    1. This is primarily because of a hadoop unable to access your input files, or may be you have something in your input which is more than required, or something missing. So, be very very careful with the input directory and files you have in them. I would say, only place exactly required input files in the input directory for the assignment and remove rest of them.

    2. Also make sure your mapper and reducer files are executable. chmod +x mapper.py and chmod +x reducer.py

    3. Run the mapper of reducer python file using cat using only mapper: cat join2_gen*.txt | ./mapper.py | sort using reducer: cat join2_gen*.txt | ./mapper.py | sort | ./reducer.py The reason for running them using cat is because If your input files have any error you can remove them before you run on Hadoop cluster. Sometimes map/reduce jobs cannot find the python errors!!

    0 讨论(0)
提交回复
热议问题