Reading from google storage gs:// filesystem from local spark instance

前端 未结 3 1111
情深已故
情深已故 2021-01-07 05:58

The problem is quite simple: You have a local spark instance (either cluster or just running it in local mode) and you want to read from gs://

3条回答
  •  -上瘾入骨i
    2021-01-07 06:39

    In my case on Spark 2.4.3 I needed to do the following to enable GCS access from Spark local. I used a JSON keyfile vs. the client.id/secret proposed above.

    1. In $SPARK_HOME/jars/, use the shaded gcs-connector jar from here: http://repo2.maven.org/maven2/com/google/cloud/bigdataoss/gcs-connector/hadoop2-1.9.17/ or else I had various failures with transitive dependencies.

    2. (Optional) To my build.sbt add:

      "com.google.cloud.bigdataoss" % "gcs-connector" % "hadoop2-1.9.17"
          exclude("javax.jms", "jms")
          exclude("com.sun.jdmk", "jmxtools")
          exclude("com.sun.jmx", "jmxri")
      
    3. In $SPARK_HOME/conf/spark-defaults.conf, add:

      spark.hadoop.google.cloud.auth.service.account.enable       true
      spark.hadoop.google.cloud.auth.service.account.json.keyfile /path/to/my/keyfile
      

    And everything is working.

提交回复
热议问题