Oracle: select missing dates

后端 未结 4 658
耶瑟儿~
耶瑟儿~ 2021-01-03 05:09

I have a table with (among other things) dates in a field.

I need to get a list of all dates that are more recent than the oldest date, older than the most recent

相关标签:
4条回答
  • 2021-01-03 05:29

    We can use simple hierarchical query as shown below:

    WITH CTE AS
    (SELECT (SELECT MIN(COL1) FROM T)+LEVEL-1 AS OUT FROM DUAL
    CONNECT BY (LEVEL-1) <= (SELECT MAX(COL1) - MIN(COL1) FROM T))
    SELECT OUT FROM CTE WHERE OUT NOT IN (SELECT COL1 FROM T);
    
    0 讨论(0)
  • 2021-01-03 05:30

    Something like this (assuming your table is named your_table and the date column is named the_date):

    with date_range as (
          select min(the_date) as oldest, 
                 max(the_date) as recent, 
                 max(the_date) - min(the_date) as total_days
          from your_table
    ),
    all_dates as (
       select oldest + level - 1 as a_date
       from date_range
       connect by level <= (select total_days from date_range)
    )
    select ad.a_date
    from all_dates ad
      left join your_table yt on ad.a_date = yt.the_date
    where yt.the_date is null
    order by ad.a_date;  
    

    Edit:
    the WITH clause is called a "common table expression" and is equivalent to a derived table ("inline view").

    It's similar to

    select * 
    from ( 
         ..... 
    ) all_dates
    join your_table ...
    

    The second CTE simply creates a list of dates "on-the-fly" using a undocumented feature of Oracle's connect by implementation.

    Re-using a select (like I did with calculating the first and last date) is a bit easier (and IMHO more readable) than using derived tables.

    Edit 2:

    This can be done with a recursive CTE as well:

    with date_range as (
          select min(the_date) as oldest, 
                 max(the_date) as recent, 
                 max(the_date) - min(the_date) as total_days
          from your_table
    ),
    all_dates (a_date, lvl) as (
       select oldest as a_date, 1 as lvl
       from date_range 
       union all
       select (select oldest from date_range) + lvl, lvl + 1
       from all_dates 
       where lvl < (select total_days from date_range)
    )
    select ad.a_date, lvl
    from all_dates ad    
      left join your_table yt on ad.a_date = yt.the_date
    where yt.the_date is null
    order by ad.a_date;  
    

    Which should work in all DBMS supporting recursive CTEs (PostgreSQL and Firebird - being more standard compliant - do need the recursive keyword though).

    Note the hack select (select oldest from date_range) + lvl, lvl + 1 in the recursive part. This should not be necessary, but Oracle still has some bugs with regards to DATEs in a recursive CTE. In PostgreSQL the following works without problems:

    ....
    all_dates (a_date, lvl) as (
       select oldest as a_date, 0 as lvl
       from date_range 
       union all
       select a_date + 1, lvl + 1
       from all_dates 
       where lvl < (select total_days from date_range)
    )
    ....
    
    0 讨论(0)
  • 2021-01-03 05:30

    I'd opt for this variant, because it's more efficient:

    with all_dates_wo_boundary_values as
    ( select oldest + level the_date
        from ( select min(the_date) oldest
                    , max(the_date) recent
                 from your_table
             )
     connect by level <= recent - oldest - 1
    )
    select the_date
      from all_dates_wo_boundary_values
     minus
    select the_date
      from your_table
    

    And here is some proof.
    First the setup:

    SQL> create table your_table (the_date)
      2  as
      3  select date '2012-01-02' from dual union all
      4  select date '2012-01-02' from dual union all
      5  select date '2012-01-03' from dual union all
      6  select date '2012-01-05' from dual union all
      7  select date '2012-01-05' from dual union all
      8  select date '2012-01-07' from dual union all
      9  select date '2012-01-08' from dual
     10  /
    
    Table created.
    
    SQL> exec dbms_stats.gather_table_stats(user,'your_table')
    
    PL/SQL procedure successfully completed.
    
    SQL> alter session set statistics_level = all
      2  /
    
    Session altered.
    

    Horse's query:

    SQL> with date_range as
      2  ( select min(the_date) as oldest
      3         , max(the_date) as recent
      4         , max(the_date) - min(the_date) as total_days
      5      from your_table
      6  )
      7  , all_dates as
      8  ( select ( select oldest from date_range) + level as a_date
      9      from dual
     10   connect by level <= (select total_days from date_range)
     11  )
     12  select ad.a_date
     13    from all_dates ad
     14         left join your_table yt on ad.a_date = yt.the_date
     15   where yt.the_date is null
     16   order by ad.a_date
     17  /
    
    A_DATE
    -------------------
    04-01-2012 00:00:00
    06-01-2012 00:00:00
    
    2 rows selected.
    
    SQL> select * from table(dbms_xplan.display_cursor(null,null,'allstats last'))
      2  /
    
    PLAN_TABLE_OUTPUT
    --------------------------------------------------------------------------------------------------------------------------------------
    SQL_ID  gaqx49vb9gz9k, child number 0
    -------------------------------------
    with date_range as ( select min(the_date) as oldest        , max(the_date) as recent        , max(the_date) - min(the_date) as total_d
    ays     from your_table )
    
    , all_dates as ( select ( select oldest from date_range) + level as a_date     from dual  connect by level <= (select total_days from
    date_range) ) select
    
    ad.a_date   from all_dates ad        left join your_table yt on ad.a_date = yt.the_date  where yt.the_date is null  order by ad.a_date
    
    Plan hash value: 1419150012
    
    ------------------------------------------------------------------------------------------------------------------------------------------------------------------------    
    | Id  | Operation                         | Name                        | Starts | E-Rows | A-Rows |   A-Time   | Buffers | Reads  | Writes |  OMem |  1Mem | Used-Mem |
    ------------------------------------------------------------------------------------------------------------------------------------------------------------------------
    |   1 |  TEMP TABLE TRANSFORMATION        |                             |      1 |        |      2 |00:00:00.01 |      22 |      1 |    1 |       |       |          |
    |   2 |   LOAD AS SELECT                  |                             |      1 |        |      1 |00:00:00.01 |       7 |      0 |    1 |   262K|   262K|  262K (0)|
    |   3 |    SORT AGGREGATE                 |                             |      1 |      1 |      1 |00:00:00.01 |       3 |      0 |    0 |       |       |          |
    |   4 |     TABLE ACCESS FULL             | YOUR_TABLE                  |      1 |      7 |      7 |00:00:00.01 |       3 |      0 |    0 |       |       |          |
    |   5 |   SORT ORDER BY                   |                             |      1 |      1 |      2 |00:00:00.01 |      12 |      1 |    0 |  2048 |  2048 | 2048  (0)|
    |*  6 |    FILTER                         |                             |      1 |        |      2 |00:00:00.01 |      12 |      1 |    0 |       |       |          |
    |*  7 |     HASH JOIN OUTER               |                             |      1 |      1 |      7 |00:00:00.01 |      12 |      1 |    0 |  1048K|  1048K|  707K (0)|
    |   8 |      VIEW                         |                             |      1 |      1 |      6 |00:00:00.01 |       9 |      1 |    0 |       |       |          |
    |   9 |       CONNECT BY WITHOUT FILTERING|                             |      1 |        |      6 |00:00:00.01 |       3 |      0 |    0 |       |       |          |
    |  10 |        FAST DUAL                  |                             |      1 |      1 |      1 |00:00:00.01 |       0 |      0 |    0 |       |       |          |
    |  11 |        VIEW                       |                             |      1 |      1 |      1 |00:00:00.01 |       3 |      0 |    0 |       |       |          |
    |  12 |         TABLE ACCESS FULL         | SYS_TEMP_0FD9D660C_81240964 |      1 |      1 |      1 |00:00:00.01 |       3 |      0 |    0 |       |       |          |
    |  13 |      TABLE ACCESS FULL            | YOUR_TABLE                  |      1 |      7 |      7 |00:00:00.01 |       3 |      0 |    0 |       |       |          |
    ------------------------------------------------------------------------------------------------------------------------------------------------------------------------
    
    
    Predicate Information (identified by operation id):
    ---------------------------------------------------
    
       6 - filter("YT"."THE_DATE" IS NULL)
       7 - access("YT"."THE_DATE"=INTERNAL_FUNCTION("AD"."A_DATE"))
    
    
    32 rows selected.
    

    And my suggestion:

    SQL> with all_dates_wo_boundary_values as
      2  ( select oldest + level the_date
      3      from ( select min(the_date) oldest
      4                  , max(the_date) recent
      5               from your_table
      6           )
      7   connect by level <= recent - oldest - 1
      8  )
      9  select the_date
     10    from all_dates_wo_boundary_values
     11   minus
     12  select the_date
     13    from your_table
     14  /
    
    THE_DATE
    -------------------
    04-01-2012 00:00:00
    06-01-2012 00:00:00
    
    2 rows selected.
    
    SQL> select * from table(dbms_xplan.display_cursor(null,null,'allstats last'))
      2  /
    
    PLAN_TABLE_OUTPUT
    --------------------------------------------------------------------------------------------------------------------------------------
    SQL_ID  7aavxmzkj7zq7, child number 0
    -------------------------------------
    with all_dates_wo_boundary_values as ( select oldest + level the_date     from ( select min(the_date) oldest
      , max(the_date) recent              from your_table          )  connect by level <= recent - oldest - 1 ) select
    the_date   from all_dates_wo_boundary_values  minus select the_date   from your_table
    
    Plan hash value: 2293301832
    
    -----------------------------------------------------------------------------------------------------------------------------------
    | Id  | Operation                       | Name       | Starts | E-Rows | A-Rows |   A-Time   | Buffers |  OMem |  1Mem | Used-Mem |
    -----------------------------------------------------------------------------------------------------------------------------------
    |   1 |  MINUS                          |            |      1 |        |      2 |00:00:00.01 |       6 |       |       |          |
    |   2 |   SORT UNIQUE                   |            |      1 |      1 |      5 |00:00:00.01 |       3 |  9216 |  9216 | 8192  (0)|
    |   3 |    VIEW                         |            |      1 |      1 |      5 |00:00:00.01 |       3 |       |       |          |
    |   4 |     CONNECT BY WITHOUT FILTERING|            |      1 |        |      5 |00:00:00.01 |       3 |       |       |          |
    |   5 |      VIEW                       |            |      1 |      1 |      1 |00:00:00.01 |       3 |       |       |          |
    |   6 |       SORT AGGREGATE            |            |      1 |      1 |      1 |00:00:00.01 |       3 |       |       |          |
    |   7 |        TABLE ACCESS FULL        | YOUR_TABLE |      1 |      7 |      7 |00:00:00.01 |       3 |       |       |          |
    |   8 |   SORT UNIQUE                   |            |      1 |      7 |      5 |00:00:00.01 |       3 |  9216 |  9216 | 8192  (0)|
    |   9 |    TABLE ACCESS FULL            | YOUR_TABLE |      1 |      7 |      7 |00:00:00.01 |       3 |       |       |          |
    -----------------------------------------------------------------------------------------------------------------------------------
    
    
    22 rows selected.
    

    Regards,
    Rob.

    0 讨论(0)
  • 2021-01-03 05:33

    You need a Calendar table (either permanent or created on the fly). Then you could do a simple:

    SELECT c.my_date
    FROM 
            calendar c
        JOIN
            ( SELECT MIN(date_column) AS min_date 
                   , MAX(date_column) AS max_date 
              FROM tableX
            ) mm
          ON c.mydate BETWEEN min_date AND max_date
    WHERE
        c.my_date NOT IN
        ( SELECT date_column
          FROM tableX
        )
    
    0 讨论(0)
提交回复
热议问题