I’ve downloaded and started up Cloudera\'s Hadoop Demo VM for CDH4 (running Hadoop 2.0.0). I’m trying to write a Java program that will run from my windows 7 machine (The same
I ran into the similar issue and have two pieces of information may help you.
The first thing I realized is I was using ssh tunnel to access the name node and when the client code tries to access data node it can not find the data node due to the tunnel somehow messed up the communication. I then run the client on the same box as the hadoop name node and it solved the problem. In short, non-standard network configuration confused hadoop to find the data node.
The reason I used ssh tunnel is I can't access name node remotely and I thought it was due to port restriction by admin, so I used ssh tunnel to bypass the restriction. But it turns out to be a misconfiguration of hadoop.
In core-site.xml after I changed
fs.defaultFS
hdfs://localhost:9000
to
hdfs://host_name:9000
I no longer need the ssh turnnel and I can access the hdfs remotely.