I am running hadoop wordcount example in single node environment on ubuntu 12.04 in vmware. i running the example like this:--
hadoop@master:~/hadoop$ hadoop
If you've created your own .jar and is trying to run it, pay attention:
In order the run your job, you had to have written something like this:
hadoop jar <jar-path> <package-path> <input-in-hdfs-path> <output-in-hdfs-path>
But if you take a closer look to your driver code, you'll see that you have set arg[0]
as your input and arg[1]
as your output... I'll show it:
FileInputFormart.addInputPath(conf, new Path(args[0]));
FileOutFormart.setOutputPath(conf, new Path(args[1]));
But, hadoop is taking arg[0
] as <package-path>
instead of <input-in-hdfs-path>
and arg[1] as <input-in-hdfs-path>
instead of <output-in-hdfs-path>
So, in order to make it work, you should use:
FileInputFormart.addInputPath(conf, new Path(args[1]));
FileOutFormart.setOutputPath(conf, new Path(args[2]));
With arg[1]
and arg[2]
, so it'll get the right things! :)
Hope it helped. Cheers.
check whether there is 'tmp' folder or not.
hadoop fs -ls /
if you see the output folder or 'tmp' delete both (considering no running active jobs)
hadoop fs -rmr /tmp
Like Dave (and the exceptions) said, your output directory already exists. You either need to output to a different directory or remove the existing one first, using:
hadoop fs -rmr /home/hadoop/gutenberg-output
Delete the output file that already exists, or output to a different file.
(I'm a little curious what other interpretations of the error message you considered.)