Insufficient number of DataNodes reporting when creating dataproc cluster

泄露秘密 提交于 2019-12-11 16:58:16

问题


I am getting "Insufficient number of DataNodes reporting" error when creating dataproc cluster with gs:// as default FS. Below is the command i am using dataproc cluster.

gcloud dataproc clusters create cluster-538f --image-version 1.2 \
    --bucket dataproc_bucket_test --subnet default --zone asia-south1-b \
    --master-machine-type n1-standard-1 --master-boot-disk-size 500 \
    --num-workers 2 --worker-machine-type n1-standard-1 --worker-boot-disk-size 500 \
    --scopes 'https://www.googleapis.com/auth/cloud-platform' --project delcure-firebase \
    --properties 'core:fs.default.name=gs://dataproc_bucket_test/'

I checked and confirmed that the bucket i am using is able to create default folder in the bucker.


回答1:


As Igor suggests, Dataproc does not support GCS as a default FS. I also suggest unsetting this property. Note, that fs.default.name property can be passed to individual jobs and will work just fine.




回答2:


The error arises when the file system is tried to be accessed (HdfsClientModule). So, I think it is probable that Google Cloud Storage doesn't have a specific feature that is required for Hadoop and the creation fails after some folders were created (first image).

As somebody else mentioned previously, it is better to give up the idea of using GCS as the default fs and leave HDFS work in Dataproc. Nonetheless, you can still take advantage of Cloud Storage to have data persistence, reliability, and performance because remember that data in HDFS is removed when a cluster is shut down.

1.- From a Dataproc node you can access data through the hadoop command to move data in and out, for example:

hadoop fs -ls gs://CONFIGBUCKET/dir/file 

hadoop distcp hdfs://OtherNameNode/dir/ gs://CONFIGBUCKET/dir/file 

2.- For accessing data from Spark or any Hadoop application just use the gs:// prefix to access your bucket.

Furthermore, if the Dataproc connector is installed on premises it can help to move HDFS data to Cloud Storage and then access it from a Dataproc cluster.



来源:https://stackoverflow.com/questions/52248139/insufficient-number-of-datanodes-reporting-when-creating-dataproc-cluster

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!