I want to run a hadoop job remotely from a windows machine. The cluster is running on Ubuntu.
Basically, I want to do two things:
Welcome to a world of pain. I've just implemented this exact use case, but using Hadoop 2.2 (the current stable release) patched and compiled from source.
What I did, in a nutshell, was:
sudo ldconfig
, see this post.hadoop-2.2.0-src/hadoop-dist/target
on the server node(s) and configure it. I can't help you with that since you need to tweak it to your cluster topology.c:\java\jdk1.7
.JAVA_HOME
, HADOOP_HOME
and PATH
environment variables as described in these instructionsunix2dos
(from Cygwin or standalone) to convert all .cmd
files in the bin
and etc\hadoop
directories, otherwise you'll get weird errors about labels when running them.fs.default.name
, mapreduce.jobtracker.address
, yarn.resourcemanager.hostname
and the alike.If you've managed all of that, you can start your Linux Hadoop cluster and connect to it from your Windows command prompt. Joy!