问题
I need to write a function that will count the number of working days (minus weekends, and a vector of other local bank holidays), but the problem I'm coming up against is more simply illustrated with just counting the number of weekdays.
Here is a function that will give the number of weekdays between two dates:
removeWeekends <- function(end, start){
range <- as.Date(start:end, "1970-01-01")
range<- range[sapply(range, function(x){
if(!chron::is.weekend(x)){
return(TRUE)
}else{
return(FALSE)
}
})]
return(NROW(range))
}
Which works when it is given a single date for each argument:
removeWeekends(as.Date("2018-05-08"), as.Date("2018-06-08"))
#[1] 24
But when it is given a two vectors from a data frame it fails:
one <- as.Date("2017-01-01"):as.Date("2017-01-08")
two <- as.Date("2018-06-08"):as.Date("2018-06-15")
df <- data.frame(one, two)
removeWeekends(df$two, df$one)
#[1] 375
#Warning messages:
#1: In start:end : numerical expression has 8 elements: only the first used
#2: In start:end : numerical expression has 8 elements: only the first used
I've also tried (which I guessed would not work as the syntax seems off):
lapply(df, removeWeekends, df$two, df$one)
#Error in FUN(X[[i]], ...) : unused argument (17167:17174)
And:
lapply(df[,c("two", "one")], removeWeekends)
#Error in as.Date(start:end, "1970-01-01") : argument "start" is missing,
# with no default
I'm assuming it is me misunderstanding the concept of vectorization.
The only other idea I've got is nesting the function within a conditional to see if it's a vector, then calling an apply function on it if it is although I'm not quite sure how I would structure that either.
回答1:
You have couple of options to support vectorized
argument in function. Since, you have already written your function, the easiest option would be to use Vectorize
and convert your function to support vectorized arguments. Another, option is to modify your function and re-write it to support vectorized arguments.
Option#1: Using Vectorize
# Function will support vectorized argument with single statement
vremoveWeekends <- Vectorize(removeWeekends)
# Try vremoveWeekends function
df$dayswithoutweekends <- vremoveWeekends(df$two, df$one)
Option#2: Re-write function to support vectorized arguments. I'll prefer this option since, OP got two arguments which are expected to be of same length. Hence, it will be easier to perform error checking on arguments if we re-write it.
# Modified function
removeWeekendsNew <- function(end, start){
if(length(start) != length(end)){
return(0L) #Error condition
}
result <- rep(0L, length(start)) #store the result for each row
#One can use mapply instead of for-loop. But for-loop will be faster
for(i in seq_along(start)){
range = seq(start[i], end[i], by="day")
result[i] = length(range[!chron::is.weekend(range)])
}
return(result)
}
#Use new function:
df$dayswithoutweekends <- removeWeekendsNew(df$two, df$one)
Result: It's same for both options mentioned above.
df
# one two dayswithoutweekends
# 1 2017-01-01 2018-06-08 375
# 2 2017-01-02 2018-06-09 375
# 3 2017-01-03 2018-06-10 374
# 4 2017-01-04 2018-06-11 374
# 5 2017-01-05 2018-06-12 374
# 6 2017-01-06 2018-06-13 374
# 7 2017-01-07 2018-06-14 374
# 8 2017-01-08 2018-06-15 375
Data:
one <- seq(as.Date("2017-01-01"),as.Date("2017-01-08"), by="day")
two <- seq(as.Date("2018-06-08"),as.Date("2018-06-15"), by="day")
df <- data.frame(one, two)
df
# one two
# 1 2017-01-01 2018-06-08
# 2 2017-01-02 2018-06-09
# 3 2017-01-03 2018-06-10
# 4 2017-01-04 2018-06-11
# 5 2017-01-05 2018-06-12
# 6 2017-01-06 2018-06-13
# 7 2017-01-07 2018-06-14
# 8 2017-01-08 2018-06-15
回答2:
If you want to fully vectorize this, you will need to think out of the box. What chron::is.weekend
does is just checking how many days were Sundays and Saturdays in a certain time preiod. We can calculate this ourselves in a vectorized way because each week has two weekends, and the only tricky part are the left overs.
I wrote the following function to achieve this, though I'm sure it could be improved
frw <- function(two, one) {
diff_d <- two - one ## difference in days
l_d <- (two + 4L) %% 7L + 1L ## last day of the remainder
weeks <- diff_d %/% 7L ## number of weeks between
days <- diff_d %% 7L ## days left
## calculate how many work days left
diff_d -
((weeks * 2L) + ((l_d - days < 1) + ((l_d - days < 2) - (l_d == 1L))) +
(l_d %in% c(1L, 7L))) + 1L
}
You can run it as follows
frw(two, one)
## [1] 375 375 374 374 374 374 374 375
It is by far faster than the mapply
version (almost instant), some benchmark on a bigger data:
one <- as.Date("2017-01-01"):as.Date("2030-01-08")
two <- as.Date("2017-05-01"):as.Date("2030-05-08")
df <- data.frame(one, two)
system.time(res_mapply <- vremoveWeekends(df$two, df$one)) # taken from the other answer
# user system elapsed
# 76.46 0.06 77.25
system.time(res_vectorized <- frw(df$two, df$one))
# user system elapsed
# 0 0 0
identical(res_mapply, res_vectorized)
# [1] TRUE
来源:https://stackoverflow.com/questions/50766836/r-vectorization-of-a-user-defined-function