Hive number of reducers in group by and count(distinct)
问题 I was told that count(distinct ) may result in data skew because only one reducer is used. I made a test using a table with 5 billion data with 2 queries, Query A: select count(distinct columnA) from tableA Query B: select count(columnA) from (select columnA from tableA group by columnA) a Actually, query A takes about 1000-1500 seconds while query B takes 500-900 seconds. The result seems expected. However, I realize that both queries use 370 mappers and 1 reducers and thay have almost the