While submit job with pyspark, how to access static files upload with --files argument?

后端 未结 3 1005
佛祖请我去吃肉
佛祖请我去吃肉 2021-02-08 00:53

for example, i have a folder:

/
  - test.py
  - test.yml

and the job is submited to spark cluster with:

gcloud beta dataproc jobs

3条回答
  •  不知归路
    2021-02-08 01:31

    Currently, as Dataproc is not in beta anymore, in order to direct access a file in the Cloud Storage from the PySpark code, submitting the job with --files parameter will do the work. SparkFiles is not required. For example:

    gcloud dataproc jobs submit pyspark \
      --cluster *cluster name* --region *region name* \
      --files gs:/// gs:///filename.py
    

    While reading input from gcs via Spark API, it works with gcs connector.

提交回复
热议问题