Can I delete data (rows in tables) from Athena?

前端 未结 4 1174
[愿得一人]
[愿得一人] 2021-01-11 12:14

Is it possible to delete data stored in S3 through an Athena query? I have some rows I have to delete from a couple of tables (they point to separate buckets in S3).

<
相关标签:
4条回答
  • 2021-01-11 12:33

    Use AWS Glue for that. Load your data, delete what you need to delete, save the data back.

    Now you can also delete files from s3 and merge data: https://aws.amazon.com/about-aws/whats-new/2020/01/aws-glue-adds-new-transforms-apache-spark-applications-datasets-amazon-s3/

    0 讨论(0)
  • 2021-01-11 12:45

    You can leverage Athena to find out all the files that you want to delete and then delete them separately. There is a special variable "$path".

    Select "$path" from <table> where <condition to get row of files to delete>
    

    To automate this, you can have iterator on Athena results and then get filename and delete them from S3.

    0 讨论(0)
  • 2021-01-11 12:50

    I also would like to add that after you find the files to be updated you can filter the rows you want to delete, and create new files using CTAS: https://docs.aws.amazon.com/athena/latest/ug/ctas.html

    Later you can replace the old files with the new ones created by CTAS. I think it is the most simple way to go

    0 讨论(0)
  • 2021-01-11 12:54

    I would just like to add to Dhaval's answer.

    You can find out the path of the file with the rows that you want to delete and instead of deleting the entire file, you can just delete the rows from the S3 file which I am assuming would be in the Json format.

    The process is to download the particular file which has those rows, remove the rows from that file and upload the same file to S3.

    This just replaces the original file with the one with modified data (in your case, without the rows that got deleted). After the upload, Athena would tranform the data again and the deleted rows won't show up.

    0 讨论(0)
提交回复
热议问题