Trying to avoid for loop with sapply (for gsub)

本秂侑毒 提交于 2019-12-10 21:22:14

问题


Trying to avoid using a for loop in the following code by utilizing sapply, if at all possible. The solution with loop works perfectly fine for me, I'm just trying to learn more R and explore as many methods as possible.

Objective: have a vector i and two vectors sf (search for) and rp (replace). For each i need to loop over sf and replace with rp where match.

i  = c("1 6 5 4","7 4 3 1")
sf = c("1","2","3")
rp = c("one","two","three")

funn <- function(i) {
  for (j in seq_along(sf)) i = gsub(sf[j],rp[j],i,fixed=T)
  return(i)
}
print(funn(i))

Result (correct):

[1] "one 6 5 4"     "7 4 three one"

I'd like to do the very same, but with sapply

#Trying to avoid a for loop in a fun
#funn1 <- function(i) {
#  i = gsub(sf,rp,i,fixed=T)
#  return(i)
#}
#print(sapply(i,funn1))

Apparently, the above commented code will not work as I can only get the first element of the sf. This is my first time using sapply, so I'm not exactly sure how to convert an "inner" implicit loop into a vectorized solution. Any help (even a statement - this is not possible) is appreciated!

(I'm aware of mgsub but this is not the solution here. Would like to keep gsub)

EDIT: full code with packages and belowoffered solutions and timing:

#timing
library(microbenchmark)
library(functional)

i  = rep(c("1 6 5 4","7 4 3 1"),10000)
sf = rep(c("1","2","3"),100)
rp = rep(c("one","two","three"),100)

#Loop
funn <- function(i) {
  for (j in seq_along(sf)) i = gsub(sf[j],rp[j],i,fixed=T)
  return(i)
}
t1 = proc.time()
k = funn(i)
t2 = proc.time()

#print(k)

print(microbenchmark(funn(i),times=10))

#mapply
t3 = proc.time()
mapply(function(u,v) i<<-gsub(u,v,i), sf, rp)
t4 = proc.time()

#print(i)

print(microbenchmark(mapply(function(u,v) i<<-gsub(u,v,i), sf, rp),times=10))

#Curry
t5 = proc.time()
Reduce(Compose, Map(function(u,v) Curry(gsub, pattern=u, replacement=v), sf, rp))(i)
t6 = proc.time()

print(microbenchmark(Reduce(Compose, Map(function(u,v) Curry(gsub, pattern=u, replacement=v), sf, rp))(i), times=10))

#4th option
n <- length(sf)
sf <- setNames(sf,1:n)
rp <- setNames(rp,1:n)

t7 = proc.time()
Reduce(function(x,j) gsub(sf[j],rp[j],x,fixed=TRUE),c(list(i),as.list(1:n)))
t8 = proc.time()

print(microbenchmark(Reduce(function(x,j) gsub(sf[j],rp[j],x,fixed=TRUE),c(list(i),as.list(1:n))),times=10))

#Usual proc.time
print(t2-t1)
print(t4-t3)
print(t6-t5)
print(t8-t7)

Times:

Unit: milliseconds
    expr min  lq mean median  uq max neval
 funn(i) 143 143  149    145 147 165    10
Unit: seconds
                                               expr min  lq mean median  uq max neval
 mapply(function(u, v) i <<- gsub(u, v, i), sf, rp) 4.1 4.2  4.4    4.3 4.4 4.9    10
Unit: seconds
                                                                                           expr min  lq mean median  uq max neval
 Reduce(Compose, Map(function(u, v) Curry(gsub, pattern = u, replacement = v),      sf, rp))(i) 1.6 1.6  1.7    1.7 1.7 1.7    10
Unit: milliseconds
                                                                                      expr min  lq mean median  uq max neval
 Reduce(function(x, j) gsub(sf[j], rp[j], x, fixed = TRUE), c(list(i),      as.list(1:n))) 141 144  147    145 146 162    10
   user  system elapsed 
   0.15    0.00    0.15 
   user  system elapsed 
   4.49    0.03    4.52 
   user  system elapsed 
   1.68    0.02    1.68 
   user  system elapsed 
   0.19    0.00    0.18 

So, indeed in this case the for loop offers best timing and is (in my opinion) most straightforward, simple, and possibly elegant. Sticking to loop.

Thanks to all. All suggestions accepted and upvoted.


回答1:


One approach - advantage is conciseness but clearly not functional programming oriented - since it has border effect in modifying i:

mapply(function(u,v) i<<-gsub(u,v,i), sf, rp)
#> i
#[1] "one 6 5 4"     "7 4 three one"

Or here is a pure functional programming approach:

library(functional)
Reduce(Compose, Map(function(u,v) Curry(gsub, pattern=u, replacement=v), sf, rp))(i)
#[1] "one 6 5 4"     "7 4 three one"

What is does is that Map(function(u,v) Curry(gsub, pattern=u, replacement=v), sf, rp) builds a list of function which will respectively replace 1 with one, 2 with two, etc. Then these functions are composed and applied to i, giving the desired result.




回答2:


sapply(seq_along(sf),function(x)i<-gsub(sf[x],rp[x],i))



回答3:


This is sequential, so a loop seems natural. Here's a solution that is almost as bad as <<-:

n  <- length(sf)
Reduce(function(x,j) gsub(sf[j],rp[j],x,fixed=TRUE),c(list(i),as.list(1:n)))
# [1] "one 6 5 4"     "7 4 three one"

Really, you should use a loop.



来源:https://stackoverflow.com/questions/30241461/trying-to-avoid-for-loop-with-sapply-for-gsub

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!