Hadoop: File … could only be replicated to 0 nodes, instead of 1

匆匆过客 提交于 2019-12-25 06:37:30

问题


I am trying to deploy Hadoop-RDMA on 8 node IB (OFED-1.5.3-4.0.42) cluster and got into the following problem (a.k.a File ... could only be replicated to 0 nodes, instead of 1):

frolo@A11:~/hadoop-rdma-0.9.8> ./bin/hadoop dfs -copyFromLocal ../pg132.txt /user/frolo/input/pg132.txt
Warning: $HADOOP_HOME is deprecated.

14/02/05 19:06:30 WARN hdfs.DFSClient: DataStreamer Exception: java.lang.reflect.UndeclaredThrowableException
    at com.sun.proxy.$Proxy1.addBlock(Unknown Source)
    at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
    at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
    at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
    at java.lang.reflect.Method.invoke(Method.java:606)
    at org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(Unknown Source)
    at org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(Unknown Source)
    at com.sun.proxy.$Proxy1.addBlock(Unknown Source)
    at org.apache.hadoop.hdfs.From.Code(Unknown Source)
    at org.apache.hadoop.hdfs.From.F(Unknown Source)
    at org.apache.hadoop.hdfs.From.F(Unknown Source)
    at org.apache.hadoop.hdfs.The.run(Unknown Source)
Caused by: org.apache.hadoop.ipc.RemoteException: java.io.IOException: File /user/frolo/input/pg132.txt could only be replicated to 0 nodes, instead of 1
    at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getAdditionalBlock(Unknown Source)
    at org.apache.hadoop.hdfs.server.namenode.NameNode.addBlock(Unknown Source)
    at sun.reflect.GeneratedMethodAccessor6.invoke(Unknown Source)
    at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
    at java.lang.reflect.Method.invoke(Method.java:606)
    at org.apache.hadoop.ipc.RPC$Server.call(Unknown Source)
    at org.apache.hadoop.ipc.rdma.madness.Code(Unknown Source)
    at org.apache.hadoop.ipc.rdma.madness.run(Unknown Source)
    at java.security.AccessController.doPrivileged(Native Method)
    at javax.security.auth.Subject.doAs(Subject.java:415)
    at org.apache.hadoop.security.UserGroupInformation.doAs(Unknown Source)
    at org.apache.hadoop.ipc.rdma.be.run(Unknown Source)
    at org.apache.hadoop.ipc.rdma.RDMAClient.Code(Unknown Source)
    at org.apache.hadoop.ipc.rdma.RDMAClient.call(Unknown Source)
    at org.apache.hadoop.ipc.Tempest.invoke(Unknown Source)
    ... 12 more`

14/02/05 19:06:30 WARN hdfs.DFSClient: Error Recovery for null bad datanode[0] nodes == null
14/02/05 19:06:30 WARN hdfs.DFSClient: Could not get block locations. Source file "/user/frolo/input/pg132.txt" - Aborting...
14/02/05 19:06:30 INFO hdfs.DFSClient: exception in isClosed

It seems that data is not transferred to DataNodes when I start copying from local filesystem to HDFS. I tested availability of DataNodes:

frolo@A11:~/hadoop-rdma-0.9.8> ./bin/hadoop dfsadmin -report
Warning: $HADOOP_HOME is deprecated.

Configured Capacity: 0 (0 KB)
Present Capacity: 0 (0 KB)
DFS Remaining: 0 (0 KB)
DFS Used: 0 (0 KB)
DFS Used%: �%
Under replicated blocks: 0
Blocks with corrupt replicas: 0
Missing blocks: 0`

-------------------------------------------------
Datanodes available: 0 (4 total, 4 dead)`

`Name: 10.10.1.13:50010
Decommission Status : Normal
Configured Capacity: 0 (0 KB)
DFS Used: 0 (0 KB)
Non DFS Used: 0 (0 KB)
DFS Remaining: 0(0 KB)
DFS Used%: 100%
DFS Remaining%: 0%
Last contact: Wed Feb 05 19:02:54 MSK 2014


Name: 10.10.1.14:50010
Decommission Status : Normal
Configured Capacity: 0 (0 KB)
DFS Used: 0 (0 KB)
Non DFS Used: 0 (0 KB)
DFS Remaining: 0(0 KB)
DFS Used%: 100%
DFS Remaining%: 0%
Last contact: Wed Feb 05 19:02:54 MSK 2014


Name: 10.10.1.16:50010
Decommission Status : Normal
Configured Capacity: 0 (0 KB)
DFS Used: 0 (0 KB)
Non DFS Used: 0 (0 KB)
DFS Remaining: 0(0 KB)
DFS Used%: 100%
DFS Remaining%: 0%
Last contact: Wed Feb 05 19:02:54 MSK 2014


Name: 10.10.1.11:50010
Decommission Status : Normal
Configured Capacity: 0 (0 KB)
DFS Used: 0 (0 KB)
Non DFS Used: 0 (0 KB)
DFS Remaining: 0(0 KB)
DFS Used%: 100%
DFS Remaining%: 0%
Last contact: Wed Feb 05 19:02:55 MSK 2014

and tried to mkdir in HDFS filesystem which has been successful. Restarting of Hadoop daemons have not produced any positive effect.

Could you please help me with this issue? Thank you.

Best, Alex


回答1:


I have found my problem. The issue was related to configuration of hadoop.tmp.dir which has been set to NFS partition. By default it is configured to /tmp which is local fs. After removing hadoop.tmp.dir from core-site.xml the problem has been solved.




回答2:


In my case, this issue was resolved by opening the firewall on por 50010



来源:https://stackoverflow.com/questions/21581448/hadoop-file-could-only-be-replicated-to-0-nodes-instead-of-1

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!