Removing DUPLICATE rows in hive based on columns

前端 未结 3 1275
南方客
南方客 2021-02-09 18:19

I have a HIVE table with 10 columns where first 9 columns will have duplicate rows while the 10th column will not as it CREATE_DATE which will have the date it was created.

3条回答
  •  陌清茗
    陌清茗 (楼主)
    2021-02-09 19:03

    You can do the following :

    select col1,col2,dayid,marketid,max(createdate) as createdate
    from tablename
    group by col1,col2,dayid,marketid
    

    This way you are grouping the data by all the columns except the data so if there are rows with the same values in these columns they will be in the same group, and then, just "choose" the createdate you want by using an aggregate function like max/min etc.

提交回复
热议问题