The problem is quite simple: You have a local spark instance (either cluster or just running it in local mode) and you want to read from gs://
I am submitting here the solution I have come up with by combining different resources:
Download the google cloud storage connector : gs-connector and store it in $SPARK/jars/
folder (Check Alternative 1 at the bottom)
Download the core-site.xml
file from here, or copy it from below. This is a configuration file used by hadoop, (which is used by spark).
Store the core-site.xml
file in a folder. Personally I create the $SPARK/conf/hadoop/conf/
folder and store it there.
In the file indicate the hadoop conf fodler by adding the following line: export HADOOP_CONF_DIR=
Create an OAUTH2 key from the respective page of Google (Google Console-> API-Manager-> Credentials
Copy the credentials to the core-site.xml
Alternative 1: Instead of copying the file to the $SPARK/jars
folder, you can store the jar in any folder and add the folder in the spark classpath. One way is to edit SPARK_CLASSPATH
in the``folder but
SPARK_CLASSPATH` is now deprecated. Therefore one can look here on how to add a jar in the spark classpath
Register GCS Hadoop filesystem
Force OAuth2 flow
Client id of Google-managed project associated with the Cloud SDK
Client secret of Google-managed project associated with the Cloud SDK
This value is required by GCS connector, but not used in the tools provided here.
The value provided is actually an invalid project id (starts with `_`).