The problem is quite simple: You have a local spark instance (either cluster or just running it in local mode) and you want to read from gs://
In my case on Spark 2.4.3 I needed to do the following to enable GCS access from Spark local. I used a JSON keyfile vs. the client.id/secret
proposed above.
In $SPARK_HOME/jars/
, use the shaded gcs-connector
jar from here: http://repo2.maven.org/maven2/com/google/cloud/bigdataoss/gcs-connector/hadoop2-1.9.17/ or else I had various failures with transitive dependencies.
(Optional) To my build.sbt
add:
"com.google.cloud.bigdataoss" % "gcs-connector" % "hadoop2-1.9.17"
exclude("javax.jms", "jms")
exclude("com.sun.jdmk", "jmxtools")
exclude("com.sun.jmx", "jmxri")
In $SPARK_HOME/conf/spark-defaults.conf
, add:
spark.hadoop.google.cloud.auth.service.account.enable true
spark.hadoop.google.cloud.auth.service.account.json.keyfile /path/to/my/keyfile
And everything is working.