Speeding up function that uses which within a sapply call in R

醉酒当歌 提交于 2019-11-29 14:35:04

问题


I have two vector e and g. I want to know for each element in e the percentage of elements in g that are smaller. One way to implement this in R is:

set.seed(21)
e <- rnorm(1e4)
g <- rnorm(1e4)
mf <- function(p,v) {100*length(which(v<=p))/length(v)}
mf.out <- sapply(X=e, FUN=mf, v=g)

With large e or g, this takes a lot of time to run. How can I change or adapt this code to make this run faster?

Note: The mf function above is based on code from the mess function in the dismo package.


回答1:


The reason this is so slow is because you're calling your function length(e) times. It doesn't make a large difference for small vectors, but the overhead from R function calls really starts to add up with larger vectors.

Normally, you would need to move this to compiled code, but luckily you can use findInterval:

set.seed(21)
e <- rnorm(1e4)
g <- rnorm(1e4)
O <- findInterval(e,sort(g))/length(g)

# Now for some timings:
f <- function(p,v) mean(v<=p)
system.time(o <- sapply(e, f, g))
#   user  system elapsed 
#   0.95    0.03    0.98
system.time(O <- findInterval(e,sort(g))/length(g))
#   user  system elapsed 
#      0       0       0 
identical(o,O)  # may be FALSE
all.equal(o,O)  # should be TRUE

# How fast is this on large vectors?
set.seed(21)
e <- rnorm(1e7)
g <- rnorm(1e7)
system.time(O <- findInterval(e,sort(g))/length(g))
#   user  system elapsed 
#  22.08    0.08   22.31


来源:https://stackoverflow.com/questions/12982152/speeding-up-function-that-uses-which-within-a-sapply-call-in-r

标签
易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!