Cannot create a Dataproc cluster when setting the fs.defaultFS property?

落爺英雄遲暮 提交于 2019-12-25 04:22:33

问题


This was already the object of discussion in previous post, however, I'm not convinced with the answers as the Google docs specify that it is possible to create a cluster setting the fs.defaultFS property. Moreover, even if possible to set this property programmatically, sometimes, it's more convenient to set it from command line.

So I wanted to know why the following option when passed to my cluster creation command does not work: --properties core:fs.defaultFS=gs://my-bucket? Please note I haven't included all parameters as I ran the command without the previous flag and it succeeded to create the cluster. However, when passing this, I get: "failed: Cannot start master: Insufficientnumber of DataNodes reporting."

If anyone managed to create a dataproc cluster by setting the fs.defaultFS that'd be great? Thanks.


回答1:


It's true there are still known issues due to certain dependencies on actual HDFS; the docs were not intended to imply that setting fs.defaultFS to a GCS path at cluster-creation time would work, but to simply provide a convenient example of a property that appears in core-site.xml; in theory it would work to set fs.defaultFS to a different preexisting HDFS cluster, for example. I've filed a ticket to change the example in the documentation to avoid confusion.

Two options:

  1. Just override fs.defaultFS at job-submission time using per-job properties
  2. Workaround some of the known issues by setting fs.defaultFS explicitly using an initialization action instead of cluster properties.

Option 1 is better understood to work because cluster-level HDFS dependencies won't change. Option 2 works because most of the incompatibilities occur during initial startup only, and initialization actions run after the relevant daemons start up already. To override the setting in an init action, you'd use bdconfig:

bdconfig set_property \
    --name 'fs.defaultFS' \
    --value 'gs://my-bucket' \
    --configuration_file /etc/hadoop/conf/core-site.xml \
    --clobber


来源:https://stackoverflow.com/questions/54775339/cannot-create-a-dataproc-cluster-when-setting-the-fs-defaultfs-property

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!