Complete scan of dynamoDb with boto3

前端 未结 8 1931
有刺的猬
有刺的猬 2020-11-29 04:08

My table is around 220mb with 250k records within it. I\'m trying to pull all of this data into python. I realize this needs to be a chunked batch process and looped throug

相关标签:
8条回答
  • 2020-11-29 04:46

    Code for deleting dynamodb format type as @kungphu mentioned.

    import boto3
    
    from boto3.dynamodb.types import TypeDeserializer
    from boto3.dynamodb.transform import TransformationInjector
    
    client = boto3.client('dynamodb')
    paginator = client.get_paginator('query')
    service_model = client._service_model.operation_model('Query')
    trans = TransformationInjector(deserializer = TypeDeserializer())
    for page in paginator.paginate():
        trans.inject_attribute_value_output(page, service_model)
    
    0 讨论(0)
  • 2020-11-29 04:46

    I had some problems with Vincent's answer related to the transformation being applied to the LastEvaluatedKey and messing up the pagination. Solved as follows:

    import boto3
    
    from boto3.dynamodb.types import TypeDeserializer
    from boto3.dynamodb.transform import TransformationInjector
    
    client = boto3.client('dynamodb')
    paginator = client.get_paginator('scan')
    operation_model = client._service_model.operation_model('Scan')
    trans = TransformationInjector(deserializer = TypeDeserializer())
    operation_parameters = {
      'TableName': 'tablename',  
    }
    items = []
    
    for page in paginator.paginate(**operation_parameters):
        has_last_key = 'LastEvaluatedKey' in page
        if has_last_key:
            last_key = page['LastEvaluatedKey'].copy()
        trans.inject_attribute_value_output(page, operation_model)
        if has_last_key:
            page['LastEvaluatedKey'] = last_key
        items.extend(page['Items'])
    
    0 讨论(0)
  • 2020-11-29 04:47

    Turns out that Boto3 captures the "LastEvaluatedKey" as part of the returned response. This can be used as the start point for a scan:

    data= table.scan(
       ExclusiveStartKey=data['LastEvaluatedKey']
    )
    

    I plan on building a loop around this until the returned data is only the ExclusiveStartKey

    0 讨论(0)
  • 2020-11-29 04:52

    I think the Amazon DynamoDB documentation regarding table scanning answers your question.

    In short, you'll need to check for LastEvaluatedKey in the response. Here is an example using your code:

    import boto3
    dynamodb = boto3.resource('dynamodb',
                              aws_session_token=aws_session_token,
                              aws_access_key_id=aws_access_key_id,
                              aws_secret_access_key=aws_secret_access_key,
                              region_name=region
    )
    
    table = dynamodb.Table('widgetsTableName')
    
    response = table.scan()
    data = response['Items']
    
    while 'LastEvaluatedKey' in response:
        response = table.scan(ExclusiveStartKey=response['LastEvaluatedKey'])
        data.extend(response['Items'])
    
    0 讨论(0)
  • 2020-11-29 05:00

    DynamoDB limits the scan method to 1mb of data per scan.

    Documentation: https://boto3.amazonaws.com/v1/documentation/api/latest/reference/services/dynamodb.html#DynamoDB.Client.scan

    Here is an example loop to get all the data from a DynamoDB table using LastEvaluatedKey:

    import boto3
    client = boto3.client('dynamodb')
    
    def dump_table(table_name):
        results = []
        last_evaluated_key = None
        while True:
            if last_evaluated_key:
                response = client.scan(
                    TableName=table_name,
                    ExclusiveStartKey=last_evaluated_key
                )
            else: 
                response = client.scan(TableName=table_name)
            last_evaluated_key = response.get('LastEvaluatedKey')
            
            results.extend(response['Items'])
            
            if not last_evaluated_key:
                break
        return results
    
    # Usage
    data = dump_table('your-table-name')
    
    # do something with data
    
    
    0 讨论(0)
  • 2020-11-29 05:03

    Riffing off of Jordon Phillips's answer, here's how you'd pass a FilterExpression in with the pagination:

    import boto3
    
    client = boto3.client('dynamodb')
    paginator = client.get_paginator('scan')
    operation_parameters = {
      'TableName': 'foo',
      'FilterExpression': 'bar > :x AND bar < :y',
      'ExpressionAttributeValues': {
        ':x': {'S': '2017-01-31T01:35'},
        ':y': {'S': '2017-01-31T02:08'},
      }
    }
    
    page_iterator = paginator.paginate(**operation_parameters)
    for page in page_iterator:
        # do something
    
    0 讨论(0)
提交回复
热议问题