问题
Step1 : AssumeRole
public static AWSCredentialsProvider getCredentials() {
if (roleARN.length() > 0) {
STSAssumeRoleSessionCredentialsProvider credentialsProvider = new STSAssumeRoleSessionCredentialsProvider
.Builder(roleARN, Constants.SESSION_NAME)
.withStsClient(AWSSecurityTokenServiceClientBuilder.defaultClient())
.build();
return credentialsProvider;
}
return new ProfileCredentialsProvider();
}
Step 2 : Set Credentials to pipeline
credentials = getCredentials();
pipeline.getOptions().as(AwsOptions.class).setAwsRegion(Regions.US_WEST_2.getName());
pipeline.getOptions().as(AwsOptions.class).setAwsCredentialsProvider(new AWSStaticCredentialsProvider(new BasicAWSCredentials(credentials.getCredentials().getAWSAccessKeyId(), credentials.getCredentials().getAWSAccessKeyId())));
Step 3 : Run pipeline to write to s3
PCollection<GenericRecord> parquetRecord = formattedEvent
.apply("ParquetRecord", ParDo.of(new ParquetWriter()))
.setCoder(AvroCoder.of(getOutput_schema()));
parquetRecord.apply(FileIO.<GenericRecord, GenericRecord>writeDynamic()
.by(elm -> elm)
.via(ParquetIO.sink(getOutput_schema()))
.to(outputPath).withNumShards(1)
.withNaming(type -> FileNaming.getNaming("part", ".snappy.parquet", "" + DateTime.now().getMillisOfSecond()))
.withDestinationCoder(AvroCoder.of(getOutput_schema())));
I am using 'org.apache.beam:beam-sdks-java-io-parquet:jar:2.22.0'
and
'org.apache.beam:beam-sdks-java-io-amazon-web-services:jar:2.22.0'
Issue : Currently assumeRole seems to be not working.
Errors :
org.apache.beam.sdk.util.UserCodeException: java.lang.RuntimeException: org.apache.beam.sdk.util.UserCodeException: java.io.IOException: com.amazonaws.services.s3.model.AmazonS3Exception: The AWS Access Key Id you provided does not exist in our records.
Or
Caused by: com.fasterxml.jackson.databind.JsonMappingException: Unexpected IOException (of type java.io.IOException): Failed to serialize and deserialize property 'awsCredentialsProvider' with value 'com.amazonaws.auth.InstanceProfileCredentialsProvider@71262020'
回答1:
where do you run this pipeline from (in an AWS account ?) if yes then it is better to provide assume role access to the Role which runs the pipeline and then from the pipeline FileIO will just use the default AWS Client.
It is better to shift the assume role operation out of the pipeline and just allow S3 permissions to the Role running the pipeline.
回答2:
Recently release of beam (2.24.0) has the feature to assume role.
来源:https://stackoverflow.com/questions/62439623/i-am-trying-to-write-to-s3-using-assumerole-via-fileio-with-parquetio