Fill missing date values in column by adding delivery interval to another date column

為{幸葍}努か 提交于 2019-12-01 12:58:08

问题


Data:

DB1 <- data.frame(orderItemID  = 1:10,     
orderDate = c("2013-01-21","2013-03-31","2013-04-12","2013-06-01","2014-01-01", "2014-02-19","2014-02-27","2014-10-02","2014-10-31","2014-11-21"),  
deliveryDate = c("2013-01-23", "2013-03-01", "NA", "2013-06-04", "2014-01-03", "NA", "2014-02-28", "2014-10-04", "2014-11-01", "2014-11-23"))

Expected Outcome:

   DB1 <- data.frame(orderItemID  = 1:10,     
 orderDate= c("2013-01-21","2013-03-31","2013-04-12","2013-06-01","2014-01-01", "2014-02-19","2014-02-27","2014-10-02","2014-10-31","2014-11-21"),  
deliveryDate = c("2013-01-23", "2013-03-01", "2013-04-14", "2013-06-04", "2014-01-03", "2014-02-21", "2014-02-28", "2014-10-04", "2014-11-01", "2014-11-23"))

Hey guys, it´s me again ;) and unfortunately (I think) I have a pretty difficult question... As you can see above I have some missing values in the delivery dates and I want to replace them by another date. That date should be the order date of the specific item + the average delivery time, in (full) days. (In the example it's 1.75 days so that rounds to 2 days) The average delivery time is the time calculated from the average value of all samples that do not contain Missing values = (2days+1day+3days+2days+1day+2days+1day+2days):8=1,75

so in a first step the average delivery time needs to be calculated an in the second step the order date + the average delivery time (in full days) needs to be entered instead of the NA´s

I tried already a little with [is.na(DB1$deliveryDate)] but unfortunately I have no good idea how to solve the problem...

Hope somebody got an idea


回答1:


You want to do date-arithmetic, and fill NAs in deliveryDate column by adding a date-interval of two days to orderDate column. lubridate supplies convenience functions for time intervals like days(), weeks(), months(), years(), hours(), minutes(), seconds() for exactly that purpose. And first, you have to parse your (European-format) datestrings into R date objects.

Something like the following, using lubridate for date-arithmetic and dplyr for dataframe manipulation:

require(dplyr)

DB1$orderDate    = as.POSIXct(DB1$orderDate, format="%d.%m.%y", tz='UTC')
DB1$deliveryDate = as.POSIXct(DB1$deliveryDate, format="%d.%m.%y", tz='UTC')

DB1 %>% group_by(orderDate) %>%
        summarize(delivery_time = (deliveryDate - orderDate)) %>%
        ungroup() %>% summarize(median(delivery_time, na.rm=T))

# median(delivery_time, na.rm = T)
#                         1.5 days
# so you round up to 2 days
delivery_days = 2.0

require(lubridate)
DB1 <- DB1 %>% filter(is.na(deliveryDate)) %>%
                mutate(deliveryDate = orderDate + days(2))

# orderItemID  orderDate deliveryDate
#           3 2013-04-12   2013-04-14
#           6 2014-02-19   2014-02-21


来源:https://stackoverflow.com/questions/31485858/fill-missing-date-values-in-column-by-adding-delivery-interval-to-another-date-c

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!