Streaming Command Failed! in RHADOOP

前端 未结 2 842
滥情空心
滥情空心 2021-01-18 06:24

I have installed RHADOOP in Hortonwork VM. when I am running mapreduce code to verify it is throwing an error saying

I am using user as :rstudio (not root.but has

相关标签:
2条回答
  • 2021-01-18 06:38

    Your current implmentation is using Rstudio. Can you try writing the code in .R and run using the hadoop jar $HADOOP_HOME/contrib/streaming/hadoop-streaming.jar -input file-in-hadoop -output hdfs_output_dir -file mapper_file -file reducer_file -mapper mapper.R -reducer reducer.R

    By the way your exception PipeMapRed.waitOutputThreads(): can be caused only when there isn't proper input/output path specified. Please do check your paths.

    This should work.

    0 讨论(0)
  • 2021-01-18 06:57

    Your code worked fine for me on changing the HADOOP_CMD and HADOOP_STREAMING to match my system configuration (I'm running hadoop 2.4.0 on Ubuntu 14.04).

    My suggestion is that:

    • Ensure that functional instance of hadoop is running i.e., the command jps on your terminal should show below output:

    enter image description here

    • Ensure that rJava library gets loaded while you are loading library(rhdfs).
    • Ensure that you are referring to the correct streaming jar file.

    Below is the R code and the output:

    Sys.setenv("HADOOP_CMD"="/usr/local/hadoop/bin/hadoop")
    Sys.setenv("HADOOP_STREAMING"="/usr/local/hadoop/share/hadoop/tools/lib/hadoop-streaming-2.4.0.jar")
    
    library(rhdfs)
    # Loading required package: rJava
    # HADOOP_CMD=/usr/local/hadoop/bin/hadoop
    # Be sure to run hdfs.init()
    
    hdfs.init()
    library(rmr2)
    ints = to.dfs(1:10)
    calc = mapreduce(input = ints, map = function(k, v) cbind(v, 2*v))
    

    Output:

    15/04/07 05:18:44 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
    15/04/07 05:18:45 INFO Configuration.deprecation: mapred.reduce.tasks is deprecated. Instead, use mapreduce.job.reduces
    packageJobJar: [/usr/local/hadoop/data/hadoop-unjar1328285833881826794/] [] /tmp/    streamjob6167004817219806828.jar tmpDir=null
    15/04/07 05:18:47 INFO client.RMProxy: Connecting to ResourceManager at localhost/127.0.0.1:8050
    15/04/07 05:18:47 INFO client.RMProxy: Connecting to ResourceManager at localhost/127.0.0.1:8050
    15/04/07 05:18:48 INFO mapred.FileInputFormat: Total input paths to process : 1
    15/04/07 05:18:49 INFO mapreduce.JobSubmitter: number of splits:2
    15/04/07 05:18:49 INFO mapreduce.JobSubmitter: Submitting tokens for job: job_1428363713092_0002
    15/04/07 05:18:49 INFO impl.YarnClientImpl: Submitted application application_1428363713092_0002
    15/04/07 05:18:50 INFO mapreduce.Job: The url to track the job: http://manohar-dt:8088/proxy/application_1428363713092_0002/
    15/04/07 05:18:50 INFO mapreduce.Job: Running job: job_1428363713092_0002
    15/04/07 05:19:00 INFO mapreduce.Job: Job job_1428363713092_0002 running in uber mode : false
    15/04/07 05:19:00 INFO mapreduce.Job:  map 0% reduce 0%
    15/04/07 05:19:15 INFO mapreduce.Job:  map 50% reduce 0%
    15/04/07 05:19:16 INFO mapreduce.Job:  map 100% reduce 0%
    15/04/07 05:19:17 INFO mapreduce.Job: Job job_1428363713092_0002 completed successfully
    15/04/07 05:19:17 INFO mapreduce.Job: Counters: 30
        File System Counters
            FILE: Number of bytes read=0
            FILE: Number of bytes written=194356
            FILE: Number of read operations=0
            FILE: Number of large read operations=0
            FILE: Number of write operations=0
            HDFS: Number of bytes read=979
            HDFS: Number of bytes written=919
            HDFS: Number of read operations=14
            HDFS: Number of large read operations=0
            HDFS: Number of write operations=4
        Job Counters 
            Launched map tasks=2
            Data-local map tasks=2
        Total time spent by all maps in occupied slots (ms)=25803
        Total time spent by all reduces in occupied slots (ms)=0
        Total time spent by all map tasks (ms)=25803
        Total vcore-seconds taken by all map tasks=25803
        Total megabyte-seconds taken by all map tasks=26422272
        Map-Reduce Framework
        Map input records=3
        Map output records=3
        Input split bytes=186
        Spilled Records=0
        Failed Shuffles=0
        Merged Map outputs=0
        GC time elapsed (ms)=293
        CPU time spent (ms)=3640
        Physical memory (bytes) snapshot=322818048
        Virtual memory (bytes) snapshot=2107604992
        Total committed heap usage (bytes)=223346688
        File Input Format Counters 
        Bytes Read=793
        File Output Format Counters 
            Bytes Written=919
    15/04/07 05:19:17 INFO streaming.StreamJob: Output directory: /tmp/file11d247219866
    

    Hope this helps.

    0 讨论(0)
提交回复
热议问题