Copy data from Amazon S3 to Redshift and avoid duplicate rows

前端未结

关注

 4  1042

深忆病人 2021-02-01 10:03

I am copying data from Amazon S3 to Redshift. During this process, I need to avoid the same files being loaded again. I don\'t have any unique constraints on my Redshift table.

4条回答

无人及你 (楼主)

2021-02-01 10:32

Currently there is no way to remove duplicates from redshift. Redshift doesn't support primary key/unique key constraints, and also removing duplicates using row number is not an option (deleting rows with row number greater than 1) as the delete operation on redshift doesn't allow complex statements (Also the concept of row number is not present in redshift).

The best way to remove duplicates is to write a cron/quartz job that would select all the distinct rows, put them in a separate table and then rename the table to your original table.

Insert into temp_originalTable (Select Distinct from originalTable)

Drop table originalTable

Alter table temp_originalTable rename to originalTable

0 讨论(0)

查看其它4个回答
发布评论:

提交评论
- 加载中...