How to add only missing Dates in Dataframe

后端 未结 3 833
别那么骄傲
别那么骄傲 2020-12-11 18:48

I have below mentioned data frame:

Date        Val1     Val2
2018-04-01  125      0.05
2018-04-03  458      2.99
2018-04-05  354      1.25

相关标签:
3条回答
  • 2020-12-11 19:35

    Here's a correction of your approach, in base R.

    Replace max(t1$Date) bySys.Date() in your real application:

    t2<-merge(data.frame(Date= as.Date(min(t1$Date):max(t1$Date),"1970-1-1")),
              t1, by = "Date", all = TRUE)
    t2[is.na(t2)] <- 0
    
    #         Date Val1 Val2
    # 1 2018-04-01  125 0.05
    # 2 2018-04-02    0 0.00
    # 3 2018-04-03  458 2.99
    # 4 2018-04-04    0 0.00
    # 5 2018-04-05  354 1.25
    

    data

    t1 <- read.table(text="Date        Val1     Val2
    '2018-04-01'  125 0.05
    '2018-04-03'  458 2.99
    '2018-04-05'  354 1.25",h=T,strin=F)
    t1$Date <- as.Date(df$Date)
    
    0 讨论(0)
  • 2020-12-11 19:36

    This could be done with complete

    library(tidyverse)
    df1 %>%
        complete(Date = seq(Date[1], Sys.Date(), by = "1 day"),
                    fill = list(Val1 = 0, Val2 = 0))
    

    If we need to pass multiple variables for the fill, create the list of columns that we need to fill

    nm1 <- setdiff(names(df1), "Date") #in this example excluding the Date
    nm2 <- setNames(as.list(rep(0, length(nm1))), nm1)
    

    and then pass that as argument for the fill

    df1 %>% 
         complete(Date = seq(Date[1], Sys.Date(), by = "1 day"), fill = nm2)
    # A tibble: 35 x 3
    #   Date        Val1  Val2
    #   <date>     <dbl> <dbl>
    # 1 2018-04-01   125  0.05
    # 2 2018-04-02     0  0   
    # 3 2018-04-03   458  2.99
    # 4 2018-04-04     0  0   
    # 5 2018-04-05   354  1.25
    # 6 2018-04-06     0  0   
    # 7 2018-04-07     0  0   
    # 8 2018-04-08     0  0   
    # 9 2018-04-09     0  0   
    #10 2018-04-10     0  0   
    # ... with 25 more rows
    
    0 讨论(0)
  • 2020-12-11 19:36

    You could use padr. padr is made for filling in missing date values. First you add the missing dates based on the interval, and if you do not want NA's you fill them with a value (or function of most occuring value)

    edit: added end_val to include the run until sys.Date()

    library(padr)
    # Specify end_val to go all the way to sys.Date and add 1 to include sys.Date
    padded_df <- pad(df, interval = "day", end_val = Sys.Date()+1)
    padded_df <- fill_by_value(padded_df, value = 0)
    padded_df
    
            Date Val1 Val2
    1 2018-04-01  125 0.05
    2 2018-04-02    0 0.00
    3 2018-04-03  458 2.99
    4 2018-04-04    0 0.00
    5 2018-04-05  354 1.25
    .....
    
    31 2018-05-01    0    0
    32 2018-05-02    0    0
    33 2018-05-03    0    0
    34 2018-05-04    0    0
    35 2018-05-05    0    0
    36 2018-05-06    0    0
    
    0 讨论(0)
提交回复
热议问题