Hive - Is there a way to further optimize a HiveQL query?

前端 未结 4 746
梦毁少年i
梦毁少年i 2021-01-15 11:56

I have written a query to find 10 most busy airports in the USA from March to April. It produces the desired output however I want to try to further optimize it.

Ar

4条回答
  •  滥情空心
    2021-01-15 12:25

    It might help if you do the aggregation before the union all:

    SELECT a.airport, SUM(cnt) AS Total_Flights
    FROM ((SELECT Origin AS Airport, COUNT(*) as cnt 
           FROM flights_stats
           WHERE (Cancelled = 0 AND Month IN (3,4))
           GROUP BY Origin
          ) UNION ALL
          (SELECT Dest AS Airport, COUNT(*) as cnt
           FROM flights_stats
           WHERE Cancelled = 0 AND Month IN (3,4)
           GROUP BY Dest
          )
         ) f INNER JOIN
         airports a
         ON f.Airport = a.iata AND a.country = 'USA'
    GROUP BY a.airport
    ORDER BY Total_Flights DESC
    LIMIT 10;
    

提交回复
热议问题