问题
I created an AWS EMR Cluster through the regular EMR Cluster wizard on the AWS Management Console and I was able to select a security-configuration e.g., when you export the CLI command it's --security-configuration 'mySecurityConfigurationValue'
.
I now need to create a similar EMR through the AWS Data Pipeline but I don't see any options where I can specify this security-configuration field.
The only similar fields I see are EmrManagedSlaveSecurityGroup, EmrManagedMasterSecurityGroup, AdditionalSlaveSecurityGroups, AdditionalMasterSecurityGroups, and SubnetId. I already have all of those filled out in my Pipeline configuration but I just need to also specify the security-configuration. Any thoughts?
回答1:
Unfortunately, DataPipeline does not support the Security Configurations feature (as well as other features that were introduced in the EMR 5.x versions like using a custom AMI).
One solution for this is to:
- Replace the
EmrCluster
in your pipeline with an EC2 resource - Use a
ShellCommandActivity
on the EC2 resource to run theaws emr create-cluster
CLI command - Use a bootstrap step to install TaskRunner on the cluster
- Replace all the
runsOn
properties in your pipeline withworkerGroup
so the tasks run on the EMR cluster you created in step 2 - Add a final
ShellCommandActivity
at the end of the pipeline to terminate the cluster using CLI
Now since you are spinning up your cluster using the CLI you have access to all kinds of features like security configurations, custom AMI, instance fleets, etc. and you can still orchestrate the tasks using DataPipeline.
来源:https://stackoverflow.com/questions/50353136/security-configuration-field-for-aws-data-pipeline-emrcluster