I’ve downloaded and started up Cloudera\'s Hadoop Demo VM for CDH4 (running Hadoop 2.0.0). I’m trying to write a Java program that will run from my windows 7 machine (The same
I got a same problem.
In my case, a key of the problem was following error message.
There are 1 datanode(s) running and 1 node(s) are excluded in this operation.
It means that your hdfs-client couldn't connect to your datanode with 50010 port. As you connected to hdfs namenode, you could got a datanode's status. But, your hdfs-client would failed to connect to your datanode.
(In hdfs, a namenode manages file directories, and datanodes. If hdfs-client connect to a namnenode, it will find a target file path and address of datanode that have the data. Then hdfs-client will communicate with datanode. (You can check those datanode uri by using netstat. because, hdfs-client will be trying to communicate with datanodes using by address informed by namenode)
I solved that problem by:
"dfs.client.use.datanode.hostname", "true"
I'm sorry for my poor English skill.
Here is how I create files in the HDFS:
import java.io.BufferedReader;
import java.io.BufferedWriter;
import java.io.InputStreamReader;
import java.io.OutputStream;
import java.io.OutputStreamWriter;
import org.apache.hadoop.fs.FileSystem;
import org.apache.hadoop.fs.Path;
FileSystem hdfs = FileSystem.get(context.getConfiguration());
Path outFile=new Path("/path to store the output file");
String line1=null;
if (!hdfs.exists(outFile)){
OutputStream out = hdfs.create(outFile);
BufferedWriter br = new BufferedWriter(new OutputStreamWriter(out, "UTF-8"));
br.write("whatever data"+"\n");
br.close();
hdfs.close();
}
else{
String line2=null;
BufferedReader br1 = new BufferedReader(new InputStreamReader(hdfs.open(outFile)));
while((line2=br1.readLine())!=null){
line1=line1.concat(line2)+"\n";
}
br1.close();
hdfs.delete(outFile, true);
OutputStream out = hdfs.create(outFile);
BufferedWriter br2 = new BufferedWriter(new OutputStreamWriter(out, "UTF-8"));
br2.write(line1+"new data"+"\n");
br2.close();
hdfs.close();
}
Go to linux VM and check the hostname and iP ADDRESS(use ifconfig cmd). Then in the linux vm edit /etc/host file with
IPADDRESS (SPALCE) hostname
example : 192.168.110.27 clouderavm
and change the all your hadoop configuration files like
core-site.xml
hdfs-site.xml
mapred-site.xml
yarn-site.xml
change localhost or localhost.localdomain or 0.0.0.0 to your hostname
then Restart cloudera manger.
in the windows machine edit C:\Windows\System32\Drivers\etc\hosts
add one line at the end with
you vm machine ip and hostname (same as you done on the /etc/host file in the vm)
VMIPADRESS VMHOSTNAME
example :
192.168.110.27 clouderavm
then check now, it should work, for detail configuration check following VIDEO from you tube
https://www.youtube.com/watch?v=fSGpYHjGIRY
in the hadoop configuration, default replication is set to 3. check it once and change accordingly to your requirements
You can try deleting the data (dfs/data) folder manually and formating the namenode. You can then start hadoop.