Postgres where clause over two columns

五迷三道 提交于 2019-12-11 15:33:47

问题


Database - I am working on in Postgres 9.6.5

I am analyzing the data from US Airport Authority (RITA) about the flights arrival and departures. This link (http://stat-computing.org/dataexpo/2009/the-data.html) lists all the columns in the table.

The table has following 29 columns

No Name Description

1 Year 1987-2008

2 Month 1-12

3 DayofMonth 1-31

4 DayOfWeek 1 (Monday) - 7 (Sunday)

5 DepTime actual departure time (local, hhmm)

6 CRSDepTime scheduled departure time (local, hhmm)

7 ArrTime actual arrival time (local, hhmm)

8 CRSArrTime scheduled arrival time (local, hhmm)

9 UniqueCarrier unique carrier code

10 FlightNum flight number

11 TailNum plane tail number

12 ActualElapsedTime in minutes

13 CRSElapsedTime in minutes

14 AirTime in minutes

15 ArrDelay arrival delay, in minutes

16 DepDelay departure delay, in minutes

17 Origin origin IATA airport code

18 Dest destination IATA airport code

19 Distance in miles

20 TaxiIn taxi in time, in minutes

21 TaxiOut taxi out time in minutes

22 Cancelled was the flight cancelled?

23 CancellationCode reason for cancellation (A = carrier, B = weather, C = NAS, D = security)

24 Diverted 1 = yes, 0 = no

25 CarrierDelay in minutes

26 WeatherDelay in minutes

27 NASDelay in minutes

28 SecurityDelay in minutes

29 LateAircraftDelay in minutes

There are about a million rows for each year.

I am trying to find out a count the most busy airports when delay is more than 15minutes. column DepDelay - has the delay time. origin - is the origin code for the airport.

All the data has been loaded into a table called 'ontime'

I am forming the query as follows in stages.

  1. select airports where delay is more than 15 minutes

    select origin,year,count(*) as depdelay_count from ontime where depdelay > 15
    group by year,origin order by depdelay_count desc )

  2. Now I wish to pull out only the top 10 airports per year - which I am doing as follows

    select x.origin,x.year from (with subquery as ( select origin,year,count(*) as depdelay_count from ontime where depdelay > 15 group by year,origin order by depdelay_count desc ) select origin,year,rank() over (partition by year order by depdelay_count desc) as rank from subquery) x where x.rank <= 10;

  3. Now that I have the top 10 airports by depdelay - I wish to get a count of the total flights out of these airports.

    select origin,count() from ontime where origin in (select x.origin from (with subquery as ( select origin,year,count() as depdelay_count from ontime where depdelay > 15 group by year,origin order by depdelay_count desc ) select origin,year,rank() over (partition by year order by depdelay_count desc) as rank from subquery) x where x.rank <= 2) group by origin order by origin;

If I modify the Step 3 query by adding the year in the year clause

---- will be any value from (1987 to 2008)

select origin,count(*) from ontime where year = (<YEAR>) origin in  
(select x.origin from (with subquery as (
    select origin,year,count(*) as depdelay_count from ontime 
    where 
    depdelay > 15
    group by year,origin 
    order by depdelay_count desc 
    )
    select origin,year,rank() over (partition by year order by depdelay_count desc) as rank from subquery) x where x.rank <= 2)
    group by origin
    order by origin;

But I have to do this manually for all years from 1987 to 2008 which I want to avoid.

Please can you help refine the query so that I can select the data for all the years without having to select each year manually.


回答1:


I find CTEs int he middle of queries to e confusing. You can basically do this with one CTE/subquery:

with oy as (
      select origin, year, count(*) as numflights,
             sum( (depdelay > 15)::int ) as depdelay_count,
             row_number() over (partition by year order by sum( (depdelay > 15)::int ) desc) as seqnum
      from ontime
      group by origin, year
     ) 
select oy.*
from oy
where seqnum <= 10;

Note the use of conditional aggregation and using window functions with aggregation functions.



来源:https://stackoverflow.com/questions/46425141/postgres-where-clause-over-two-columns

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!