I’ve downloaded and started up Cloudera\'s Hadoop Demo VM for CDH4 (running Hadoop 2.0.0). I’m trying to write a Java program that will run from my windows 7 machine (The same
From error message replication factor seems to be fine i.e.1. It Seems datanode is properly functioning or have permission issues. Check the permissions and check the status of datanode form the user, you are trying to run hadoop.
Since I found many questions like this one in my search for having the exact same issue I thought I would share what finally worked for me. I found this forum post on Hortonworks: https://community.hortonworks.com/questions/16837/cannot-copy-from-local-machine-to-vm-datanode-via.html
The answer was truly understanding what calling new Configuration() means and setting the correct parameters as I needed them. In my case it was exactly the one mentioned in that post. So my working code looks like this.
try {
Configuration config = new Configuration();
config.set("dfs.client.use.datanode.hostname", "true");
Path pdFile = new Path("stgicp-" + pd);
FileSystem dFS = FileSystem.get(new URI("hdfs://" + HadoopProperties.HIVE_HOST + ":" + HadoopProperties.HDFS_DEFAULT_PORT), config,
HadoopProperties.HIVE_DEFAULT_USER);
if (dFS.exists(pdFile)) {
dFS.delete(pdFile, false);
}
FSDataOutputStream outStream = dFS.create(pdFile);
for (String sjWLR : processWLR.get(pd)) {
outStream.writeBytes(sjWLR);
}
outStream.flush();
outStream.close();
dFS.delete(pdFile, false);
dFS.close();
} catch (IOException | URISyntaxException | InterruptedException e) {
log.error("WLR file processing error: " + e.getMessage());
}
It appears to be some issue with the FS. Either the parameters in cross-site.xml are not matching the file it is trying to read
OR
there is some common mismatch in the path (I see there being a WINDOWS reference).
you can use cygwin tool to setup the path and place it where the datanodes and temp file locations are placed and that should sufficiently do the trick Location : $/bin/cygpath.exe
P.S. Replication does NOT seem to be the primary issue here according to me
I had a similar problem, in my case I just emptied the following folder ${hadoop.tmp.dir}/nm-local-dir/usercache/{{hdfs_user}}/appcache/
add given property in hdfs-site.xml
<property>
<name>dfs.replication</name>
<value>1</value>
</property>
and add this file also in your program
conf.addResource("hdfs-site.xml");
stop hadoop
stop-all.sh
then start
start-all.sh
I ran into the similar issue and have two pieces of information may help you.
The first thing I realized is I was using ssh tunnel to access the name node and when the client code tries to access data node it can not find the data node due to the tunnel somehow messed up the communication. I then run the client on the same box as the hadoop name node and it solved the problem. In short, non-standard network configuration confused hadoop to find the data node.
The reason I used ssh tunnel is I can't access name node remotely and I thought it was due to port restriction by admin, so I used ssh tunnel to bypass the restriction. But it turns out to be a misconfiguration of hadoop.
In core-site.xml after I changed
<name>fs.defaultFS</name>
<value>hdfs://localhost:9000</value>
to
<value>hdfs://host_name:9000</value>
I no longer need the ssh turnnel and I can access the hdfs remotely.