I have installed RHADOOP in Hortonwork VM. when I am running mapreduce code to verify it is throwing an error saying
I am using user as :rstudio (not root.but has
Your current implmentation is using Rstudio. Can you try writing the code in .R and run using the hadoop jar $HADOOP_HOME/contrib/streaming/hadoop-streaming.jar -input file-in-hadoop -output hdfs_output_dir -file mapper_file -file reducer_file -mapper mapper.R -reducer reducer.R
By the way your exception PipeMapRed.waitOutputThreads():
can be caused only when there isn't proper input/output path specified. Please do check your paths.
This should work.
Your code worked fine for me on changing the HADOOP_CMD
and HADOOP_STREAMING
to match my system configuration (I'm running hadoop 2.4.0 on Ubuntu 14.04).
My suggestion is that:
jps
on your terminal should show below output:Below is the R code and the output:
Sys.setenv("HADOOP_CMD"="/usr/local/hadoop/bin/hadoop")
Sys.setenv("HADOOP_STREAMING"="/usr/local/hadoop/share/hadoop/tools/lib/hadoop-streaming-2.4.0.jar")
library(rhdfs)
# Loading required package: rJava
# HADOOP_CMD=/usr/local/hadoop/bin/hadoop
# Be sure to run hdfs.init()
hdfs.init()
library(rmr2)
ints = to.dfs(1:10)
calc = mapreduce(input = ints, map = function(k, v) cbind(v, 2*v))
Output:
15/04/07 05:18:44 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
15/04/07 05:18:45 INFO Configuration.deprecation: mapred.reduce.tasks is deprecated. Instead, use mapreduce.job.reduces
packageJobJar: [/usr/local/hadoop/data/hadoop-unjar1328285833881826794/] [] /tmp/ streamjob6167004817219806828.jar tmpDir=null
15/04/07 05:18:47 INFO client.RMProxy: Connecting to ResourceManager at localhost/127.0.0.1:8050
15/04/07 05:18:47 INFO client.RMProxy: Connecting to ResourceManager at localhost/127.0.0.1:8050
15/04/07 05:18:48 INFO mapred.FileInputFormat: Total input paths to process : 1
15/04/07 05:18:49 INFO mapreduce.JobSubmitter: number of splits:2
15/04/07 05:18:49 INFO mapreduce.JobSubmitter: Submitting tokens for job: job_1428363713092_0002
15/04/07 05:18:49 INFO impl.YarnClientImpl: Submitted application application_1428363713092_0002
15/04/07 05:18:50 INFO mapreduce.Job: The url to track the job: http://manohar-dt:8088/proxy/application_1428363713092_0002/
15/04/07 05:18:50 INFO mapreduce.Job: Running job: job_1428363713092_0002
15/04/07 05:19:00 INFO mapreduce.Job: Job job_1428363713092_0002 running in uber mode : false
15/04/07 05:19:00 INFO mapreduce.Job: map 0% reduce 0%
15/04/07 05:19:15 INFO mapreduce.Job: map 50% reduce 0%
15/04/07 05:19:16 INFO mapreduce.Job: map 100% reduce 0%
15/04/07 05:19:17 INFO mapreduce.Job: Job job_1428363713092_0002 completed successfully
15/04/07 05:19:17 INFO mapreduce.Job: Counters: 30
File System Counters
FILE: Number of bytes read=0
FILE: Number of bytes written=194356
FILE: Number of read operations=0
FILE: Number of large read operations=0
FILE: Number of write operations=0
HDFS: Number of bytes read=979
HDFS: Number of bytes written=919
HDFS: Number of read operations=14
HDFS: Number of large read operations=0
HDFS: Number of write operations=4
Job Counters
Launched map tasks=2
Data-local map tasks=2
Total time spent by all maps in occupied slots (ms)=25803
Total time spent by all reduces in occupied slots (ms)=0
Total time spent by all map tasks (ms)=25803
Total vcore-seconds taken by all map tasks=25803
Total megabyte-seconds taken by all map tasks=26422272
Map-Reduce Framework
Map input records=3
Map output records=3
Input split bytes=186
Spilled Records=0
Failed Shuffles=0
Merged Map outputs=0
GC time elapsed (ms)=293
CPU time spent (ms)=3640
Physical memory (bytes) snapshot=322818048
Virtual memory (bytes) snapshot=2107604992
Total committed heap usage (bytes)=223346688
File Input Format Counters
Bytes Read=793
File Output Format Counters
Bytes Written=919
15/04/07 05:19:17 INFO streaming.StreamJob: Output directory: /tmp/file11d247219866
Hope this helps.