Amazon s3a returns 400 Bad Request with Spark

后端 未结 3 992
一整个雨季
一整个雨季 2020-12-02 01:33

For checkout purpose I try to set up an Amazon S3 bucket as checkpoint file.

val checkpointDir = \"s3a://bucket-name/checkpoint.txt\"
val sc = new SparkConte         


        
相关标签:
3条回答
  • 2020-12-02 02:15

    If you'd like to anyway use the region that supports Signature V4 in spark you can pass flag -Dcom.amazonaws.services.s3.enableV4 to the driver options and executor options on runtime. For example:

    spark-submit --conf spark.driver.extraJavaOptions='-Dcom.amazonaws.services.s3.enableV4' \
        --conf spark.executor.extraJavaOptions='-Dcom.amazonaws.services.s3.enableV4' \
        ... (other spark options)
    

    With this settings Spark is able to write to Frankfurt (and other V4-only regions) even with not-so-fresh AWS sdk version (com.amazonaws:aws-java-sdk:1.7.4 in my case)

    0 讨论(0)
  • 2020-12-02 02:18

    This message correspond to something like "bad endpoint" or bad signature version support.

    like seen here frankfurt is the only one that not support signature version 2. I it's the one I pick.

    Of course after all my reserch can't say what is signature version, it's not obvious in the documentation. But the V2 seems to work with s3a.

    The endpoint seen in the S3 interface is not the real endpoint it's just the web endpoint.

    you have to use once a theses endpoint like that sc.hadoopConfiguration.set("fs.s3a.endpoint", "s3-eu-west-1.amazonaws.com")

    But it's work by default with US endpoint

    0 讨论(0)
  • 2020-12-02 02:37

    I was facing the same issue when running spark locally, for me reason was SIGV4 was not getting set, this code helped me:

    import com.amazonaws.SDKGlobalConfiguration
    System.setProperty(SDKGlobalConfiguration.ENABLE_S3_SIGV4_SYSTEM_PROPERTY, "true")
    
    0 讨论(0)
提交回复
热议问题