问题
Data:
DB1 <- data.frame(orderItemID = 1:10,
orderDate = c("2013-01-21","2013-03-31","2013-04-12","2013-06-01","2014-01-01", "2014-02-19","2014-02-27","2014-10-02","2014-10-31","2014-11-21"),
deliveryDate = c("2013-01-23", "2013-03-01", "NA", "2013-06-04", "2014-01-03", "NA", "2014-02-28", "2014-10-04", "2014-11-01", "2014-11-23"))
Expected Outcome:
DB1 <- data.frame(orderItemID = 1:10,
orderDate= c("2013-01-21","2013-03-31","2013-04-12","2013-06-01","2014-01-01", "2014-02-19","2014-02-27","2014-10-02","2014-10-31","2014-11-21"),
deliveryDate = c("2013-01-23", "2013-03-01", "2013-04-14", "2013-06-04", "2014-01-03", "2014-02-21", "2014-02-28", "2014-10-04", "2014-11-01", "2014-11-23"))
Hey guys, it´s me again ;) and unfortunately (I think) I have a pretty difficult question... As you can see above I have some missing values in the delivery dates and I want to replace them by another date. That date should be the order date of the specific item + the average delivery time, in (full) days. (In the example it's 1.75 days so that rounds to 2 days) The average delivery time is the time calculated from the average value of all samples that do not contain Missing values = (2days+1day+3days+2days+1day+2days+1day+2days):8=1,75
so in a first step the average delivery time needs to be calculated an in the second step the order date + the average delivery time (in full days) needs to be entered instead of the NA´s
I tried already a little with [is.na(DB1$deliveryDate)] but unfortunately I have no good idea how to solve the problem...
Hope somebody got an idea
回答1:
You want to do date-arithmetic, and fill NAs in deliveryDate
column by adding a date-interval of two days to orderDate
column. lubridate
supplies convenience functions for time intervals like days(), weeks(), months(), years(), hours(), minutes(), seconds()
for exactly that purpose.
And first, you have to parse your (European-format) datestrings into R date objects.
Something like the following, using lubridate for date-arithmetic and dplyr for dataframe manipulation:
require(dplyr)
DB1$orderDate = as.POSIXct(DB1$orderDate, format="%d.%m.%y", tz='UTC')
DB1$deliveryDate = as.POSIXct(DB1$deliveryDate, format="%d.%m.%y", tz='UTC')
DB1 %>% group_by(orderDate) %>%
summarize(delivery_time = (deliveryDate - orderDate)) %>%
ungroup() %>% summarize(median(delivery_time, na.rm=T))
# median(delivery_time, na.rm = T)
# 1.5 days
# so you round up to 2 days
delivery_days = 2.0
require(lubridate)
DB1 <- DB1 %>% filter(is.na(deliveryDate)) %>%
mutate(deliveryDate = orderDate + days(2))
# orderItemID orderDate deliveryDate
# 3 2013-04-12 2013-04-14
# 6 2014-02-19 2014-02-21
来源:https://stackoverflow.com/questions/31485858/fill-missing-date-values-in-column-by-adding-delivery-interval-to-another-date-c