问题
I have a SPARK cluster with Yarn, and I want to put my job's jar into a S3 100% compatible Object Store. If I want to submit the job, I search from google and seems that just simply as this way: spark-submit --master yarn --deploy-mode cluster <...other parameters...> s3://my_ bucket/jar_file However the S3 Object Store required user name and password to access. So how I can config those credential information to let SPARRK download the jar from S3? Many thanks!
回答1:
You can use Default Credential Provider Chain
from AWS docs:
export AWS_ACCESS_KEY_ID=your_access_key
export AWS_SECRET_ACCESS_KEY=your_secret_key
./bin/spark-submit \
--master local[2] \
--class org.apache.spark.examples.SparkPi \
s3a://your_bucket/.../spark-examples_2.11-2.4.6-SNAPSHOT.jar
I needed to download the following jars from Maven and put it to Spark jar dir in order to allow to use s3a
schema in spark-submit
(note, you can use --packages
directive to reference these dependencies from inside your jar, but not from spark-submit
itself):
// build Spark `assembly` project
sbt "project assembly" package
cd assembly/target/scala-2.11/jars/
wget https://repo1.maven.org/maven2/com/amazonaws/aws-java-sdk/1.7.4/aws-java-sdk-1.7.4.jar
wget https://repo1.maven.org/maven2/org/apache/hadoop/hadoop-aws/2.7.7/hadoop-aws-2.7.7.jar
来源:https://stackoverflow.com/questions/60900601/how-to-submit-a-spark-job-of-which-the-jar-is-hosted-in-s3-object-store