Complete scan of dynamoDb with boto3

前端 未结 8 1928
有刺的猬
有刺的猬 2020-11-29 04:08

My table is around 220mb with 250k records within it. I\'m trying to pull all of this data into python. I realize this needs to be a chunked batch process and looped throug

相关标签:
8条回答
  • 2020-11-29 05:04

    The 2 approaches suggested above both have problems: Either writing lengthy and repetitive code that handles paging explicitly in a loop, or using Boto paginators with low-level sessions, and foregoing the advantages of higher-level Boto objects.

    A solution using Python functional code to provide a high-level abstraction allows higher-level Boto methods to be used, while hiding the complexity of AWS paging:

    import itertools
    import typing
    
    def iterate_result_pages(function_returning_response: typing.Callable, *args, **kwargs) -> typing.Generator:
        """A wrapper for functions using AWS paging, that returns a generator which yields a sequence of items for
        every response
    
        Args:
            function_returning_response: A function (or callable), that returns an AWS response with 'Items' and optionally 'LastEvaluatedKey'
            This could be a bound method of an object.
    
        Returns:
            A generator which yields the 'Items' field of the result for every response
        """
        response = function_returning_response(*args, **kwargs)
        yield response["Items"]
        while "LastEvaluatedKey" in response:
            kwargs["ExclusiveStartKey"] = response["LastEvaluatedKey"]
            response = function_returning_response(*args, **kwargs)
            yield response["Items"]
    
        return
    
    def iterate_paged_results(function_returning_response: typing.Callable, *args, **kwargs) -> typing.Iterator:
        """A wrapper for functions using AWS paging, that returns an iterator of all the items in the responses.
        Items are yielded to the caller as soon as they are received.
    
        Args:
            function_returning_response: A function (or callable), that returns an AWS response with 'Items' and optionally 'LastEvaluatedKey'
            This could be a bound method of an object.
    
        Returns:
            An iterator which yields one response item at a time
        """
        return itertools.chain.from_iterable(iterate_result_pages(function_returning_response, *args, **kwargs))
    
    # Example, assuming 'table' is a Boto DynamoDB table object:
    all_items = list(iterate_paged_results(ProjectionExpression = 'my_field'))
    
    0 讨论(0)
  • 2020-11-29 05:08

    boto3 offers paginators that handle all the pagination details for you. Here is the doc page for the scan paginator. Basically, you would use it like so:

    import boto3
    
    client = boto3.client('dynamodb')
    paginator = client.get_paginator('scan')
    
    for page in paginator.paginate():
        # do something
    
    0 讨论(0)
提交回复
热议问题