Duplicating records to fill gap between dates

前端 未结 4 677
我在风中等你
我在风中等你 2020-12-31 17:43

I need to do something really weird, which is to create fake records in a view to fill the gap between posted dates of product prices.

Actually, my

相关标签:
4条回答
  • 2020-12-31 18:18

    I just realized that @Wolf and @jonearles improvements do not return the exact results I needed because the row generator to list all dates won't generate the ranges by product. If the first price of product A is later than any price of product B the first listed date of product A still must be the same. But they really helped me to work further and get the expected results:

    I started with changing @wolf's date range selector from this:

    select min(price_date) beg_date, sysdate end_date from prices_test
    

    to this:

    select min(PRICE_DATE) START_DATE, sysdate as END_DATE, PRODUCT 
    from PRICES_TEST group by sysdate, PRODUCT
    

    But, somehow, the number of rows per product is exponentially growing repeatedly for each level. I just added a distinct in the outter query. The finally select was this:

    select
      DP.PRICE_DATE,
      DP.PRODUCT,
      LAST_VALUE(PT.PRICE ignore nulls) over (order by DP.PRODUCT, DP.PRICE_DATE) PRICE
    from (
      select distinct START_DATE + DAYS as PRICE_DATE, PRODUCT 
      from 
      (
        -- Row generator to list all dates from first date of each product to today
        with DATES as (select min(PRICE_DATE) START_DATE, sysdate as END_DATE, PRODUCT from PRICES_TEST group by sysdate, PRODUCT)
        select START_DATE, level - 1 as DAYS, PRODUCT
        from DATES
        connect by level < END_DATE - START_DATE + 1
        order by 3, 2
      ) d order by 2, 1
    ) DP
    left outer join prices_test pt on pt.price_date = dp.price_date and pt.product = dp.product;
    

    @Mellamokb solution is actually what I really need and is certainly better than my noobie solution.

    Thank's everyone not only for helping me with this but also for presenting me features such as "with" and "connect by".

    0 讨论(0)
  • 2020-12-31 18:29

    You can create a row generator statement using the CONNECT BY LEVEL syntax, cross joined with the distinct products in your table, and then outer join that to your prices table. The final touch is to use the LAST_VALUE function and IGNORE NULLS to repeat the price until a new value is encountered, and since you wanted a view, with a CREATE VIEW statement:

    create view dense_prices_test as
    select
        dp.price_date
      , dp.product
      , last_value(pt.price ignore nulls) over (order by dp.product, dp.price_date) price
    from (
          -- Cross join with the distinct product set in prices_test
          select d.price_date, p.product
          from (
                -- Row generator to list all dates from first date in prices_test to today
                with dates as (select min(price_date) beg_date, sysdate end_date from prices_test)
                select dates.beg_date + level - 1 price_date 
                from dual
                cross join dates
                connect by level <= dates.end_date - dates.beg_date + 1
                ) d
          cross join (select distinct product from prices_test) p
         ) dp
    left outer join prices_test pt on pt.price_date = dp.price_date and pt.product = dp.product;
    
    0 讨论(0)
  • 2020-12-31 18:33

    I think I have a solution using an incremental approach toward the final result with CTE's:

    with mindate as
    (
      select min(price_date) as mindate from PRICES_TEST
    )
    ,dates as
    (
      select mindate.mindate + row_number() over (order by 1) - 1 as thedate from mindate,
        dual d connect by level <= floor(SYSDATE - mindate.mindate) + 1
    )
    ,productdates as
    (
      select p.product, d.thedate
      from (select distinct product from PRICES_TEST) p, dates d
    )
    ,ranges as
    (
      select
        pd.product,
        pd.thedate,
        (select max(PRICE_DATE) from PRICES_TEST p2
         where p2.product = pd.product and p2.PRICE_DATE <= pd.thedate) as mindate
        from productdates pd
    )
    select 
        r.thedate,
        r.product,
        p.price
    from ranges r
    inner join PRICES_TEST p on r.mindate = p.price_date and r.product = p.product
    order by r.product, r.thedate
    
    • mindate retrieves the earliest possible date in the data set
    • dates generates a calendar of dates from earliest possible date to today.
    • productdates cross joins all possible products with all possible dates
    • ranges determines which price date applied at each date
    • the final query links which price date applied to the actual price and filters out dates for which there are no relevant price dates via the inner join condition

    Demo: http://www.sqlfiddle.com/#!4/e528f/126

    0 讨论(0)
  • 2020-12-31 18:41

    I made a few changes to Wolf's excellent answer.

    I replaced the subquery factoring (WITH) with a regular subquery in the connect by. This makes the code a little simpler. (Although this type of code looks weird at first either way, so there may not be a huge gain here.)

    Most significantly, I used a partition outer join instead of a cross join and outer join. Partition outer joins are also kind of strange, but they are meant for exactly this type of situation. This makes the code simpler, and should improve performance.

    select
        price_dates.price_date
        ,product
        ,last_value(price ignore nulls) over (order by product, price_dates.price_date) price
    from
    (
        select trunc(sysdate) - level + 1 price_date
        from dual
        connect by level <= trunc(sysdate) -
            (select min(trunc(price_date)) from prices_test) + 1
    ) price_dates
    left outer join prices_test
        partition by (prices_test.product)
        on price_dates.price_date = prices_test.price_date;
    
    0 讨论(0)
提交回复
热议问题