Copy files from S3 to EMR local using Lambda

对着背影说爱祢 提交于 2020-01-05 04:57:06

问题


I need to move the files from S3 to EMR's local dir /home/hadoop programmatically using Lambda.

S3DistCp copies over to HDFS. I then login into EMR and run a CopyToLocal hdfs command on commandline to get the files to /home/hadoop.

Is there a programmatic way using boto3 in Lambda to copy from S3 to Emr's local dir?


回答1:


I wrote a test Lambda function to submit a job step to EMR that copies files from S3 to EMR's local dir. This worked.

emrclient = boto3.client('emr', region_name='us-west-2')

def lambda_handler(event, context): 
EMRS = emrclient.list_clusters( ClusterStates = ['STARTING', 'RUNNING', 'WAITING'] ) 
clusters = EMRS["Clusters"] 
print(clusters)
for cluster in clusters: 
    ID = cluster["Id"]
    response = emrclient.add_job_flow_steps(JobFlowId=ID,
                                 Steps=[
                                     {
                                         'Name': 'AWS S3 Copy',
                                         'ActionOnFailure': 'CONTINUE',
                                         'HadoopJarStep': {
                                             'Jar': 'command-runner.jar',
                                             'Args':["aws","s3","cp","s3://XXX/","/home/hadoop/copy/","--recursive"],
                                         }
                                     }
                                 ],
                            )

If there are better ways to do the copy, please do let me know.




回答2:


That would need a way for the AWS Lambda function to remotely trigger the CopyToLocal command on the cluster.

The Lambda function could call add-steps to request the cluster to run a script that does this action.



来源:https://stackoverflow.com/questions/56623774/copy-files-from-s3-to-emr-local-using-lambda

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!