Hadoop: …be replicated to 0 nodes instead of minReplication (=1). There are 1 datanode(s) running and no node(s) are excluded in this operation

后端 未结 10 1857
走了就别回头了
走了就别回头了 2020-12-03 02:41

I\'m getting the following error when attempting to write to HDFS as part of my multi-threaded application

could only be replicated to 0 nodes instead of min         


        
相关标签:
10条回答
  • 2020-12-03 03:15

    I had a similar issue recently. As my datanodes (only) had SSDs for storage, I put [SSD]file:///path/to/data/dir for the dfs.datanode.data.dir configuration. Due to the logs containing unavailableStorages=[DISK] I removed the [SSD] tag, which solved the problem.

    Apparently, Hadoop uses [DISK] as default Storage Type, and does not 'fallback' (or rather 'fallup') to using SSD if no [DISK] tagged storage location is available. I could not find any documenation on this behaviour though.

    0 讨论(0)
  • 2020-12-03 03:18

    Another reason could be that your Datanode machine hasn't exposed the port(50010 by default). In my case, I was trying to write a file from Machine1 to HDFS running on a Docker container C1 which was hosted on Machine2. For the host machine to forward the requests to the services running on the container, the port forwarding should be taken care of. I could resolve the issue after forwarding the port 50010 from host machine to guest machine.

    0 讨论(0)
  • 2020-12-03 03:20

    This error is caused by the block replication system of HDFS since it could not manage to make any copies of a specific block within the focused file. Common reasons of that:

    1. Only a NameNode instance is running and it's not in safe-mode
    2. There is no DataNode instances up and running, or some are dead. (Check the servers)
    3. Namenode and Datanode instances are both running, but they cannot communicate with each other, which means There is connectivity issue between DataNode and NameNode instances.
    4. Running DataNode instances are not able to talk to the server because of some networking of hadoop-based issues (check logs that include datanode info)
    5. There is no hard disk space specified in configured data directories for DataNode instances or DataNode instances have run out of space. (check dfs.data.dir // delete old files if any)
    6. Specified reserved spaces for DataNode instances in dfs.datanode.du.reserved is more than the free space which makes DataNode instances to understand there is no enough free space.
    7. There is no enough threads for DataNode instances (check datanode logs and dfs.datanode.handler.count value)
    8. Make sure dfs.data.transfer.protection is not equal to “authentication” and dfs.encrypt.data.transfer is equal to true.

    Also please:

    • Verify the status of NameNode and DataNode services and check the related logs
    • Verify if core-site.xml has correct fs.defaultFS value and hdfs-site.xml has a valid value.
    • Verify hdfs-site.xml has dfs.namenode.http-address.. for all NameNode instances specified in case of PHD HA configuration.
    • Verify if the permissions on the directories are correct

    Ref: https://wiki.apache.org/hadoop/CouldOnlyBeReplicatedTo

    Ref: https://support.pivotal.io/hc/en-us/articles/201846688-HDFS-reports-Configured-Capacity-0-0-B-for-datanode

    Also, please check: Writing to HDFS from Java, getting "could only be replicated to 0 nodes instead of minReplication"

    0 讨论(0)
  • 2020-12-03 03:21

    I had the same error, re-starting hdfs services solved this issue. ie re-started NameNode and DataNode services.

    0 讨论(0)
  • 2020-12-03 03:25

    In my case the problem was hadoop temporary files

    Logs were showing the following error:

    2019-02-27 13:52:01,079 INFO org.apache.hadoop.hdfs.server.common.Storage: Lock on /tmp/hadoop-i843484/dfs/data/in_use.lock acquired by nodename 28111@slel00681841a
    2019-02-27 13:52:01,087 WARN org.apache.hadoop.hdfs.server.common.Storage: java.io.IOException: Incompatible clusterIDs in /tmp/hadoop-i843484/dfs/data: namenode clusterID = CID-38b0104b-d3d2-4088-9a54-44b71b452006; datanode clusterID = CID-8e121bbb-5a08-4085-9817-b2040cd399e1
    

    I solved by removing hadoop tmp files

    sudo rm -r /tmp/hadoop-*
    
    0 讨论(0)
  • 2020-12-03 03:27

    I too had the same error, then i have changed the block size. This came to resolve the problem.

    0 讨论(0)
提交回复
热议问题