How to make MSCK REPAIR TABLE execute automatically in AWS Athena

梦想与她 提交于 2019-11-28 23:16:14

There are a number of ways to schedule this task. How do you schedule your workflows? Do you use a system like Airflow, Luigi, Azkaban, cron, or using an AWS Data pipeline?

From any of these, you should be able to fire off the following CLI command.

$ aws athena start-query-execution --query-string "MSCK REPAIR TABLE some_database.some_table" --result-configuration "OutputLocation=s3://SOMEPLACE"

Another option would be AWS Lambda. You could have a function that calls MSCK REPAIR TABLE some_database.some_table in response to a new upload to S3.

An example Lambda Function could be written as such:

import boto3

def lambda_handler(event, context):
    bucket_name = 'some_bucket'

    client = boto3.client('athena')

    config = {
        'OutputLocation': 's3://' + bucket_name + '/',
        'EncryptionConfiguration': {'EncryptionOption': 'SSE_S3'}

    }

    # Query Execution Parameters
    sql = 'MSCK REPAIR TABLE some_database.some_table'
    context = {'Database': 'some_database'}

    client.start_query_execution(QueryString = sql, 
                                 QueryExecutionContext = context,
                                 ResultConfiguration = config)

You would then configure a trigger to execute your Lambda function when new data are added under the DATA/ prefix in your bucket.

Ultimately, explicitly rebuilding the partitions after you run your Spark Job using a job scheduler has the advantage of being self documenting. On the other hand, AWS Lambda is convenient for jobs like this one.

Tony Marti

You should be running ADD PARTITION instead:

aws athena start-query-execution --query-string "ALTER TABLE ADD PARTITION..."

Which adds a the newly created partition from your S3 location Athena leverages Hive for partitioning data. To create a table with partitions, you must define it during the CREATE TABLE statement. Use PARTITIONED BY to define the keys by which to partition data.

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!