Hive - Is there a way to further optimize a HiveQL query?

前端 未结 4 743
梦毁少年i
梦毁少年i 2021-01-15 11:56

I have written a query to find 10 most busy airports in the USA from March to April. It produces the desired output however I want to try to further optimize it.

Ar

4条回答
  •  北荒
    北荒 (楼主)
    2021-01-15 12:33

    You can test this but you are in the case where an Union maybe better, so You really need to test it and come back :

    SELECT airports.airport,
    SUM(
      CASE 
         WHEN T1.FlightsNum IS NOT NULL THEN 1
         WHEN T2.FlightsNum IS NOT NULL THEN 1
         ELSE 0
      END 
      ) AS Total_Flights
    FROM airports
    LEFT JOIN (SELECT  Origin AS Airport, FlightsNum 
        FROM flights_stats
       WHERE (Cancelled = 0 AND Month IN (3,4))) t1 
     on t1.Airport = airports.iata
    LEFT JOIN (SELECT Dest AS Airport, FlightsNum 
       FROM flights_stats
       WHERE (Cancelled = 0 AND Month IN (3,4))) t2
     on t1.Airport = airports.iata
    GROUP BY airports.airport
    ORDER BY Total_Flights DESC
    

提交回复
热议问题