sparksql比hivesql优化的点(窗口函数)
有时候,一个 select 语句中包含多个窗口函数,它们的窗口定义(OVER 子句)可能相同、也可能不同。 对于相同的窗口,完全没必要再做一次分区和排序,我们可以将它们合并成一个 Window 算子。 比如 spark、hive中窗口函数实现原理复盘 中的案例: select id , sq, cell_type, rank , row_number() over ( partition by id order by rank ) naturl_rank, rank () over ( partition by id order by rank ) as r, dense_rank () over ( partition by cell_type order by id ) as dr from window_test_table group by id ,sq,cell_type, rank ; row_number() r ank() 的窗口一样,可以放在一次分区和排序中完成,这一块hive sql与spark sql的表现是一致的。 但对于另外一种情况: select id , rank , row_number() over ( partition by id order by rank ) naturl_rank, sum ( rank ) over (