Running S3-put-triggered Lambda function on existing S3 objects?

后端 未结 5 2082
青春惊慌失措
青春惊慌失措 2021-02-15 03:59

I have a Lambda function in Node.js that processes new images added to my bucket. I want to run the function for all existing objects. How can I do this? I figured the easiest w

5条回答
  •  感情败类
    2021-02-15 04:56

    This thread helped push me in the right direction as I needed to invoke a lambda function per file for an existing 50k files in two buckets. I decided to write it in python and limit the amount of lambda functions running simultaneously to 500 (the concurrency limit for many aws regions is 1000).

    The script creates a worker pool of 500 threads who feed off a queue of bucket keys. Each worker waits for their lambda to be finished before picking up another. Since the execution of this script against my 50k files will take a couple hours, I'm just running it off my local machine. Hope this helps someone!

    #!/usr/bin/env python
    
    # Proper imports
    import json
    import time
    import base64
    from queue import Queue
    from threading import Thread
    from argh import dispatch_command
    
    import boto3
    from boto.s3.connection import S3Connection
    
    client = boto3.client('lambda')
    
    def invoke_lambdas():
        try:
            # replace these with your access keys
            s3 = S3Connection(AWS_ACCESS_KEY_ID, AWS_SECRET_ACCESS_KEY)
            buckets = [s3.get_bucket('bucket-one'), s3.get_bucket('bucket-two')]
    
            queue = Queue()
            num_threads = 500
    
            # create a worker pool
            for i in range(num_threads):
                worker = Thread(target=invoke, args=(queue,))
                worker.setDaemon(True)
                worker.start()
    
            for bucket in buckets:
                for key in bucket.list():
                    queue.put((bucket.name, key.key))
    
            queue.join()
    
        except Exception as e:
            print(e)
    
    def invoke(queue):
        while True:
            bucket_key = queue.get()
    
            try:
                print('Invoking lambda with bucket %s key %s. Remaining to process: %d'
                    % (bucket_key[0], bucket_key[1], queue.qsize()))
                trigger_event = {
                    'Records': [{
                        's3': {
                            'bucket': {
                                'name': bucket_key[0]
                            },
                            'object': {
                                'key': bucket_key[1]
                            }
                        }
                    }]
                }
    
                # replace lambda_function_name with the actual name
                # InvocationType='RequestResponse' means it will wait until the lambda fn is complete
                response = client.invoke(
                    FunctionName='lambda_function_name',
                    InvocationType='RequestResponse',
                    LogType='None',
                    ClientContext=base64.b64encode(json.dumps({}).encode()).decode(),
                    Payload=json.dumps(trigger_event).encode()
                )
                if response['StatusCode'] != 200:
                    print(response)
    
            except Exception as e:
                print(e)
                print('Exception during invoke_lambda')
    
            queue.task_done()
    
    if __name__ == '__main__':
        dispatch_command(invoke_lambdas)
    

提交回复
热议问题