Not able to run Hadoop job remotely

前端 未结 1 807
长发绾君心
长发绾君心 2021-01-19 04:25

I want to run a hadoop job remotely from a windows machine. The cluster is running on Ubuntu.

Basically, I want to do two things:

  1. Execute the hadoop jo
1条回答
  •  野趣味
    野趣味 (楼主)
    2021-01-19 04:53

    Welcome to a world of pain. I've just implemented this exact use case, but using Hadoop 2.2 (the current stable release) patched and compiled from source.

    What I did, in a nutshell, was:

    1. Download the Hadoop 2.2 sources tarball to a Linux machine and decompress it to a temp dir.
    2. Apply these patches which solve the problem of connecting from a Windows client to a Linux server.
    3. Build it from source, using these instructions. It will also ensure that you have 64-bit native libs if you have a 64-bit Linux server. Make sure you fix the build files as the post instructs or the build would fail. Note that after installing protobuf 2.5, you have to run sudo ldconfig, see this post.
    4. Deploy the resulted dist tar from hadoop-2.2.0-src/hadoop-dist/target on the server node(s) and configure it. I can't help you with that since you need to tweak it to your cluster topology.
    5. Install Java on your client Windows machine. Make sure that the path to it has no spaces in it, e.g. c:\java\jdk1.7.
    6. Deploy the same Hadoop dist tar you built on your Windows client. It will contain the crucial fix for the Windox/Linux connection problem.
    7. Compile winutils and Windows native libraries as described in this Stackoverflow answer. It's simpler than building entire Hadoop on Windows.
    8. Set up JAVA_HOME, HADOOP_HOME and PATH environment variables as described in these instructions
    9. Use a text editor or unix2dos (from Cygwin or standalone) to convert all .cmd files in the bin and etc\hadoop directories, otherwise you'll get weird errors about labels when running them.
    10. Configure the connection properties to your cluster in your config XML files, namely fs.default.name, mapreduce.jobtracker.address, yarn.resourcemanager.hostname and the alike.
    11. Add the rest of the configuration required by the patches from item 2. This is required for the client side only. Otherwise the patch won't work.

    If you've managed all of that, you can start your Linux Hadoop cluster and connect to it from your Windows command prompt. Joy!

    0 讨论(0)
提交回复
热议问题