Removing DUPLICATE rows in hive based on columns

前端 未结 3 1272
南方客
南方客 2021-02-09 18:19

I have a HIVE table with 10 columns where first 9 columns will have duplicate rows while the 10th column will not as it CREATE_DATE which will have the date it was created.

3条回答
  •  一生所求
    2021-02-09 19:13

    we don't need to write all the column name in sql code by this way:

    select * from (
      select *, row_number() over (partition by (col1, col2) order by col1) tmp_row_number
      from table_name
    ) t
    where t.tmp_row_number==1
    

    the only side effect is add an extra column tmp_row_number to the table.

提交回复
热议问题