I am trying to write to S3 using assumeRole via FileIO with ParquetIO

一曲冷凌霜 提交于 2021-01-27 20:40:10

问题


Step1 : AssumeRole

public static AWSCredentialsProvider getCredentials() {
        if (roleARN.length() > 0) {
            STSAssumeRoleSessionCredentialsProvider credentialsProvider = new STSAssumeRoleSessionCredentialsProvider
                    .Builder(roleARN, Constants.SESSION_NAME)
                    .withStsClient(AWSSecurityTokenServiceClientBuilder.defaultClient())
                    .build();
            return credentialsProvider;
        }
        return new ProfileCredentialsProvider();
    }

Step 2 : Set Credentials to pipeline

credentials = getCredentials();
pipeline.getOptions().as(AwsOptions.class).setAwsRegion(Regions.US_WEST_2.getName());
pipeline.getOptions().as(AwsOptions.class).setAwsCredentialsProvider(new AWSStaticCredentialsProvider(new BasicAWSCredentials(credentials.getCredentials().getAWSAccessKeyId(), credentials.getCredentials().getAWSAccessKeyId())));

Step 3 : Run pipeline to write to s3

PCollection<GenericRecord> parquetRecord = formattedEvent
        .apply("ParquetRecord", ParDo.of(new ParquetWriter()))
        .setCoder(AvroCoder.of(getOutput_schema()));

parquetRecord.apply(FileIO.<GenericRecord, GenericRecord>writeDynamic()
        .by(elm -> elm)
        .via(ParquetIO.sink(getOutput_schema()))
        .to(outputPath).withNumShards(1)
        .withNaming(type -> FileNaming.getNaming("part", ".snappy.parquet", "" + DateTime.now().getMillisOfSecond()))
        .withDestinationCoder(AvroCoder.of(getOutput_schema())));

I am using 'org.apache.beam:beam-sdks-java-io-parquet:jar:2.22.0' and 'org.apache.beam:beam-sdks-java-io-amazon-web-services:jar:2.22.0'

Issue : Currently assumeRole seems to be not working.

Errors :

org.apache.beam.sdk.util.UserCodeException: java.lang.RuntimeException: org.apache.beam.sdk.util.UserCodeException: java.io.IOException: com.amazonaws.services.s3.model.AmazonS3Exception: The AWS Access Key Id you provided does not exist in our records.

Or

Caused by: com.fasterxml.jackson.databind.JsonMappingException: Unexpected IOException (of type java.io.IOException): Failed to serialize and deserialize property 'awsCredentialsProvider' with value 'com.amazonaws.auth.InstanceProfileCredentialsProvider@71262020'

回答1:


where do you run this pipeline from (in an AWS account ?) if yes then it is better to provide assume role access to the Role which runs the pipeline and then from the pipeline FileIO will just use the default AWS Client.

It is better to shift the assume role operation out of the pipeline and just allow S3 permissions to the Role running the pipeline.




回答2:


Recently release of beam (2.24.0) has the feature to assume role.



来源:https://stackoverflow.com/questions/62439623/i-am-trying-to-write-to-s3-using-assumerole-via-fileio-with-parquetio

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!