问题
My question is whether my method and conclusion are correct.
As part of my learning regular expressions, I wanted to figure out in which order to learn the various alternatives (base R and packages). I thought it might help to learn the relative speeds of the alternative functions. So, I created a string vector and called what I hope are equivalent expressions.
sites <- c("http://grand.test.com/", "https://example.com/",
"http://.big.time.bhfs.com/", "http://test.blogs.mvalaw.com/")
vec <- rep(x = sites, times = 1000) # creating a longish vector
base <- gsub("http:", "", vec, perl = TRUE)
stringr <- str_replace_all(vec, "http:", replacement = "")
stringi <- stri_replace_all_regex(str = vec, pattern = "http:", replacement = "")
qdap <- genX(text.var = vec, "http:", "")
Then I benchmarked the four methods using the microbenchmarking
package.
test <- microbenchmark(base <- gsub("http:", "", vec, perl = TRUE),
stringr <- str_replace_all(vec, "http:", replacement = ""),
stringi <- stri_replace_all_regex(str = vec, pattern = "http:", replacement = ""),
qdap <- genX(text.var = vec, "http:", ""),
times = 100)
Am I correct that base R's gsub
is by far the fastest (I shortened the expr names)?
expr min lq
base 1.697001 1.739393
stringr 3.814348 3.928360
stringi 5.888857 6.172212
qdap 120.670037 124.624946
median uq max neval
1.765051 1.833770 2.976780 100
3.979453 4.123138 7.032091 100
6.276407 6.500412 7.634943 100
127.493293 130.923663 173.155253 100
The median times are very significantly different, especially for qdap
来源:https://stackoverflow.com/questions/24846611/microbenchmarking-base-r-and-three-packages-on-string-pattern-substitution