R ~ Vectorization of a user defined function

混江龙づ霸主 提交于 2021-02-02 09:23:45

问题


I need to write a function that will count the number of working days (minus weekends, and a vector of other local bank holidays), but the problem I'm coming up against is more simply illustrated with just counting the number of weekdays.

Here is a function that will give the number of weekdays between two dates:

removeWeekends <- function(end, start){

  range <- as.Date(start:end, "1970-01-01")

  range<- range[sapply(range, function(x){
                                if(!chron::is.weekend(x)){
                                  return(TRUE)
                                }else{
                                  return(FALSE)
                                }
                              })]

  return(NROW(range))

}

Which works when it is given a single date for each argument:

removeWeekends(as.Date("2018-05-08"), as.Date("2018-06-08"))
#[1] 24

But when it is given a two vectors from a data frame it fails:

one <- as.Date("2017-01-01"):as.Date("2017-01-08")
two <- as.Date("2018-06-08"):as.Date("2018-06-15")
df <- data.frame(one, two)
removeWeekends(df$two, df$one)
#[1] 375
#Warning messages:
#1: In start:end : numerical expression has 8 elements: only the first used
#2: In start:end : numerical expression has 8 elements: only the first used

I've also tried (which I guessed would not work as the syntax seems off):

lapply(df, removeWeekends, df$two, df$one)
#Error in FUN(X[[i]], ...) : unused argument (17167:17174)

And:

lapply(df[,c("two", "one")], removeWeekends)
#Error in as.Date(start:end, "1970-01-01") :   argument "start" is missing,
# with no default 

I'm assuming it is me misunderstanding the concept of vectorization.

The only other idea I've got is nesting the function within a conditional to see if it's a vector, then calling an apply function on it if it is although I'm not quite sure how I would structure that either.


回答1:


You have couple of options to support vectorized argument in function. Since, you have already written your function, the easiest option would be to use Vectorize and convert your function to support vectorized arguments. Another, option is to modify your function and re-write it to support vectorized arguments.

Option#1: Using Vectorize

# Function will support vectorized argument with single statement
vremoveWeekends  <- Vectorize(removeWeekends)

# Try vremoveWeekends  function 
df$dayswithoutweekends <- vremoveWeekends(df$two, df$one)

Option#2: Re-write function to support vectorized arguments. I'll prefer this option since, OP got two arguments which are expected to be of same length. Hence, it will be easier to perform error checking on arguments if we re-write it.

# Modified function 
removeWeekendsNew <- function(end, start){
  if(length(start) != length(end)){
    return(0L)  #Error condition
  }
  result <- rep(0L, length(start)) #store the result for each row

  #One can use mapply instead of for-loop. But for-loop will be faster
  for(i in seq_along(start)){     
    range      = seq(start[i], end[i], by="day")
    result[i]  = length(range[!chron::is.weekend(range)])
  }

  return(result)
}

#Use new function:
df$dayswithoutweekends <- removeWeekendsNew(df$two, df$one)

Result: It's same for both options mentioned above.

df
#          one        two dayswithoutweekends
# 1 2017-01-01 2018-06-08                 375
# 2 2017-01-02 2018-06-09                 375
# 3 2017-01-03 2018-06-10                 374
# 4 2017-01-04 2018-06-11                 374
# 5 2017-01-05 2018-06-12                 374
# 6 2017-01-06 2018-06-13                 374
# 7 2017-01-07 2018-06-14                 374
# 8 2017-01-08 2018-06-15                 375

Data:

one <- seq(as.Date("2017-01-01"),as.Date("2017-01-08"), by="day")
two <- seq(as.Date("2018-06-08"),as.Date("2018-06-15"), by="day")
df <- data.frame(one, two)
df
#          one        two
# 1 2017-01-01 2018-06-08
# 2 2017-01-02 2018-06-09
# 3 2017-01-03 2018-06-10
# 4 2017-01-04 2018-06-11
# 5 2017-01-05 2018-06-12
# 6 2017-01-06 2018-06-13
# 7 2017-01-07 2018-06-14
# 8 2017-01-08 2018-06-15



回答2:


If you want to fully vectorize this, you will need to think out of the box. What chron::is.weekend does is just checking how many days were Sundays and Saturdays in a certain time preiod. We can calculate this ourselves in a vectorized way because each week has two weekends, and the only tricky part are the left overs.

I wrote the following function to achieve this, though I'm sure it could be improved

frw <- function(two, one) {

  diff_d <- two - one ## difference in days
  l_d <- (two + 4L) %% 7L + 1L ## last day of the remainder 
  weeks <- diff_d %/% 7L ## number of weeks between
  days <- diff_d %% 7L ## days left

  ## calculate how many work days left
  diff_d - 
    ((weeks * 2L) + ((l_d - days < 1) + ((l_d - days < 2) - (l_d == 1L))) +
    (l_d %in% c(1L, 7L))) + 1L

}

You can run it as follows

frw(two, one)
## [1] 375 375 374 374 374 374 374 375

It is by far faster than the mapply version (almost instant), some benchmark on a bigger data:

one <- as.Date("2017-01-01"):as.Date("2030-01-08")
two <- as.Date("2017-05-01"):as.Date("2030-05-08")
df <- data.frame(one, two)

system.time(res_mapply <- vremoveWeekends(df$two, df$one)) # taken from the other answer
#  user  system elapsed 
# 76.46    0.06   77.25 

system.time(res_vectorized <- frw(df$two, df$one))
# user  system elapsed 
#    0       0       0

identical(res_mapply, res_vectorized)
# [1] TRUE


来源:https://stackoverflow.com/questions/50766836/r-vectorization-of-a-user-defined-function

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!