Filling missing dates by group

后端 未结 3 1439
盖世英雄少女心
盖世英雄少女心 2020-11-28 12:55

I have a data set that looks like this:

shop_id,item_id,time,value
150,1,2015-07-10,3
150,1,2015-07-11,5
150,1,2015-07-13,2
150,2,2015-07-10,15
150,2,2015-07         


        
相关标签:
3条回答
  • 2020-11-28 13:40

    You can use data.table from R. Assuming that 'time' column is of 'Date' class,

    library(data.table)#v1.9.5+
    DT1 <- setDT(df1)[, list(time=seq(min(time), max(time), by ='day')),
                        by =.(shop_id, item_id)]
    setkeyv(df1, names(df1)[1:3])[DT1][is.na(value), value:=0]
    #   shop_id item_id       time value
    #1:     150       1 2015-07-10     3
    #2:     150       1 2015-07-11     5
    #3:     150       1 2015-07-12     0
    #4:     150       1 2015-07-13     2
    #5:     150       2 2015-07-10    15
    #6:     150       2 2015-07-11     0
    #7:     150       2 2015-07-12    12
    

    In the devel version, you can also do this without setting the 'key'. Instructions to install the devel version are here

     df1[DT1, on =c('shop_id', 'item_id', 'time')][is.na(value), value:=0]
     #   shop_id item_id       time value
     #1:     150       1 2015-07-10     3
     #2:     150       1 2015-07-11     5
     #3:     150       1 2015-07-12     0
     #4:     150       1 2015-07-13     2
     #5:     150       2 2015-07-10    15
     #6:     150       2 2015-07-11     0
     #7:     150       2 2015-07-12    12
    

    Or as @Arun suggested, a more efficient option would be

     DT1[, value := 0L][df1, value := i.value, on = c('shop_id', 'item_id', 'time')]
     DT1 
    
    0 讨论(0)
  • 2020-11-28 13:43

    This is a Sql based solution

    First you need a dates table

    Date table query. Note this will create a physical table in your database.

    ;with cte as
    (
    select cast('2000-01-01' as datetime) as Dates -- Start date 
    union all
    select dateadd(MM,1,Dates) 
    from cte 
    where Dates < '2099-12-01' -- End date
    )
    select * 
    INTO Date_table 
    from CTE
    

    Then you need to left outer join your table with Date_table to get the missing dates.

    SELECT A.shop_id, 
           A.item_id, 
           DT.dates, 
           Isnull(Y.value, 0) 
    FROM   date_table DT 
           CROSS JOIN(SELECT DISTINCT shop_id, 
                                      item_id 
                      FROM   yourtable) A 
           LEFT OUTER JOIN yourtable Y 
                        ON t.[time] = DT.dates 
                           AND A.shop_id = Y.shop_id 
                           AND A.item_id = Y.item_id 
    
    0 讨论(0)
  • 2020-11-28 13:47

    Here's a solution with fill_by_value from padr:

    library(dplyr)
    library(tidyr)
    library(padr)
    
    df %>%
      mutate(time = as.Date(time)) %>%
      group_by(item_id) %>%
      pad() %>%              # from padr
      fill(shop_id) %>%      # from tidyr
      fill_by_value(value)   # from padr
    

    Result:

    # A tibble: 7 x 4
    # Groups:   item_id [2]
      shop_id item_id       time value
        <int>   <int>     <date> <dbl>
    1     150       1 2015-07-10     3
    2     150       1 2015-07-11     5
    3     150       1 2015-07-12     0
    4     150       1 2015-07-13     2
    5     150       2 2015-07-10    15
    6     150       2 2015-07-11     0
    7     150       2 2015-07-12    12
    

    Data:

    df = read.table(text = "shop_id,item_id,time,value
    150,1,2015-07-10,3
    150,1,2015-07-11,5
    150,1,2015-07-13,2
    150,2,2015-07-10,15
    150,2,2015-07-12,12", header = TRUE, sep = ",")
    
    0 讨论(0)
提交回复
热议问题