问题
All my calls to spark.sql("") fails with the error in the stacktrace (1) below
Update - 2 I have zeroed in on the problem, it is AccessDenied for sts:AssumeRule, any leads appreciated
User: arn:aws:sts::00000000000:assumed-role/EMR_EC2_XXXXX_XXXXXX_POLICY/i-3232131232131232 is not authorized to perform: sts:AssumeRole on resource: arn:aws:iam::00000000000:role/EMR_XXXXXX_XXXXXX_POLICY
When the same location is accessed with
spark.read.parquet("s3a://xxx.xxx-xxx-xx.xxxxx-xxxxx/xxx/")
I was able to read the records.
But the same stacktrace (1) resurfaces when access with s3: instead of s3a: scheme
spark.read.parquet("s3://xxx.xxx-xxx-xx.xxxxx-xxxxx/xxx/")
So how can I configure Spark on EMR to use s3a: or have s3: running without the access denied which is presume because it may not be using the appropriate credential chain
(1)
Caused by: com.amazon.ws.emr.hadoop.fs.shaded.com.amazonaws.services.securitytoken.model.AWSSecurityTokenServiceException: Access denied (Service: AWSSecurityTokenService; Status Code: 403; Error Code: AccessDenied; Request ID: xxxxx-xxxx-xxxx-xxxx-xxxxxxxx)
at com.amazon.ws.emr.hadoop.fs.shaded.com.amazonaws.http.AmazonHttpClient$RequestExecutor.handleErrorResponse(AmazonHttpClient.java:1658)
at com.amazon.ws.emr.hadoop.fs.shaded.com.amazonaws.http.AmazonHttpClient$RequestExecutor.executeOneRequest(AmazonHttpClient.java:1322)
at com.amazon.ws.emr.hadoop.fs.shaded.com.amazonaws.http.AmazonHttpClient$RequestExecutor.executeHelper(AmazonHttpClient.java:1072)
at com.amazon.ws.emr.hadoop.fs.shaded.com.amazonaws.http.AmazonHttpClient$RequestExecutor.doExecute(AmazonHttpClient.java:745)
at com.amazon.ws.emr.hadoop.fs.shaded.com.amazonaws.http.AmazonHttpClient$RequestExecutor.executeWithTimer(AmazonHttpClient.java:719)
at com.amazon.ws.emr.hadoop.fs.shaded.com.amazonaws.http.AmazonHttpClient$RequestExecutor.execute(AmazonHttpClient.java:701)
at com.amazon.ws.emr.hadoop.fs.shaded.com.amazonaws.http.AmazonHttpClient$RequestExecutor.access$500(AmazonHttpClient.java:669)
at com.amazon.ws.emr.hadoop.fs.shaded.com.amazonaws.http.AmazonHttpClient$RequestExecutionBuilderImpl.execute(AmazonHttpClient.java:651)
at com.amazon.ws.emr.hadoop.fs.shaded.com.amazonaws.http.AmazonHttpClient.execute(AmazonHttpClient.java:515)
at com.amazon.ws.emr.hadoop.fs.shaded.com.amazonaws.services.securitytoken.AWSSecurityTokenServiceClient.doInvoke(AWSSecurityTokenServiceClient.java:1369)
at com.amazon.ws.emr.hadoop.fs.shaded.com.amazonaws.services.securitytoken.AWSSecurityTokenServiceClient.invoke(AWSSecurityTokenServiceClient.java:1338)
at com.amazon.ws.emr.hadoop.fs.shaded.com.amazonaws.services.securitytoken.AWSSecurityTokenServiceClient.invoke(AWSSecurityTokenServiceClient.java:1327)
at com.amazon.ws.emr.hadoop.fs.shaded.com.amazonaws.services.securitytoken.AWSSecurityTokenServiceClient.executeAssumeRole(AWSSecurityTokenServiceClient.java:488)
at com.amazon.ws.emr.hadoop.fs.shaded.com.amazonaws.services.securitytoken.AWSSecurityTokenServiceClient.assumeRole(AWSSecurityTokenServiceClient.java:460)
Update - 1 Tried setting secret and access key doesn't work
spark.sparkContext.hadoopConfiguration.set("fs.s3.awsAccessKeyId", "")
spark.sparkContext.hadoopConfiguration.set("fs.s3.awsSecretAccessKey", "")
回答1:
this stack trace says "amazon EMR S3 client"; not the Apache ASF one, so different settings, and error messages.
That error message about "assumed role" hints that you are running in an EC2 VM (yes?), and that "assumed role" is actually the IAM role the EC2 VM is deployed as. In which case (a) no other credentials are being picked up and (b) that VM doesn't have permissions to access the role. Fixes: work out the setting to get the credentials in, increase EC2 IAM role rights, or create VMs with a different role
来源:https://stackoverflow.com/questions/53883833/configure-emr-to-use-s3a-instead-of-s3-for-spark-sql-calls