问题
I am trying to deploy Hadoop-RDMA on 8 node IB (OFED-1.5.3-4.0.42) cluster and got into the following problem (a.k.a File ... could only be replicated to 0 nodes, instead of 1):
frolo@A11:~/hadoop-rdma-0.9.8> ./bin/hadoop dfs -copyFromLocal ../pg132.txt /user/frolo/input/pg132.txt Warning: $HADOOP_HOME is deprecated. 14/02/05 19:06:30 WARN hdfs.DFSClient: DataStreamer Exception: java.lang.reflect.UndeclaredThrowableException at com.sun.proxy.$Proxy1.addBlock(Unknown Source) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:606) at org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(Unknown Source) at org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(Unknown Source) at com.sun.proxy.$Proxy1.addBlock(Unknown Source) at org.apache.hadoop.hdfs.From.Code(Unknown Source) at org.apache.hadoop.hdfs.From.F(Unknown Source) at org.apache.hadoop.hdfs.From.F(Unknown Source) at org.apache.hadoop.hdfs.The.run(Unknown Source) Caused by: org.apache.hadoop.ipc.RemoteException: java.io.IOException: File /user/frolo/input/pg132.txt could only be replicated to 0 nodes, instead of 1 at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getAdditionalBlock(Unknown Source) at org.apache.hadoop.hdfs.server.namenode.NameNode.addBlock(Unknown Source) at sun.reflect.GeneratedMethodAccessor6.invoke(Unknown Source) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:606) at org.apache.hadoop.ipc.RPC$Server.call(Unknown Source) at org.apache.hadoop.ipc.rdma.madness.Code(Unknown Source) at org.apache.hadoop.ipc.rdma.madness.run(Unknown Source) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:415) at org.apache.hadoop.security.UserGroupInformation.doAs(Unknown Source) at org.apache.hadoop.ipc.rdma.be.run(Unknown Source) at org.apache.hadoop.ipc.rdma.RDMAClient.Code(Unknown Source) at org.apache.hadoop.ipc.rdma.RDMAClient.call(Unknown Source) at org.apache.hadoop.ipc.Tempest.invoke(Unknown Source) ... 12 more` 14/02/05 19:06:30 WARN hdfs.DFSClient: Error Recovery for null bad datanode[0] nodes == null 14/02/05 19:06:30 WARN hdfs.DFSClient: Could not get block locations. Source file "/user/frolo/input/pg132.txt" - Aborting... 14/02/05 19:06:30 INFO hdfs.DFSClient: exception in isClosed
It seems that data is not transferred to DataNodes when I start copying from local filesystem to HDFS. I tested availability of DataNodes:
frolo@A11:~/hadoop-rdma-0.9.8> ./bin/hadoop dfsadmin -report Warning: $HADOOP_HOME is deprecated. Configured Capacity: 0 (0 KB) Present Capacity: 0 (0 KB) DFS Remaining: 0 (0 KB) DFS Used: 0 (0 KB) DFS Used%: �% Under replicated blocks: 0 Blocks with corrupt replicas: 0 Missing blocks: 0` ------------------------------------------------- Datanodes available: 0 (4 total, 4 dead)` `Name: 10.10.1.13:50010 Decommission Status : Normal Configured Capacity: 0 (0 KB) DFS Used: 0 (0 KB) Non DFS Used: 0 (0 KB) DFS Remaining: 0(0 KB) DFS Used%: 100% DFS Remaining%: 0% Last contact: Wed Feb 05 19:02:54 MSK 2014 Name: 10.10.1.14:50010 Decommission Status : Normal Configured Capacity: 0 (0 KB) DFS Used: 0 (0 KB) Non DFS Used: 0 (0 KB) DFS Remaining: 0(0 KB) DFS Used%: 100% DFS Remaining%: 0% Last contact: Wed Feb 05 19:02:54 MSK 2014 Name: 10.10.1.16:50010 Decommission Status : Normal Configured Capacity: 0 (0 KB) DFS Used: 0 (0 KB) Non DFS Used: 0 (0 KB) DFS Remaining: 0(0 KB) DFS Used%: 100% DFS Remaining%: 0% Last contact: Wed Feb 05 19:02:54 MSK 2014 Name: 10.10.1.11:50010 Decommission Status : Normal Configured Capacity: 0 (0 KB) DFS Used: 0 (0 KB) Non DFS Used: 0 (0 KB) DFS Remaining: 0(0 KB) DFS Used%: 100% DFS Remaining%: 0% Last contact: Wed Feb 05 19:02:55 MSK 2014
and tried to mkdir in HDFS filesystem which has been successful. Restarting of Hadoop daemons have not produced any positive effect.
Could you please help me with this issue? Thank you.
Best, Alex
回答1:
I have found my problem. The issue was related to configuration of hadoop.tmp.dir which has been set to NFS partition. By default it is configured to /tmp which is local fs. After removing hadoop.tmp.dir from core-site.xml the problem has been solved.
回答2:
In my case, this issue was resolved by opening the firewall on por 50010
来源:https://stackoverflow.com/questions/21581448/hadoop-file-could-only-be-replicated-to-0-nodes-instead-of-1