Microbenchmarking base R and three packages on string pattern substitution

问题

My question is whether my method and conclusion are correct.

As part of my learning regular expressions, I wanted to figure out in which order to learn the various alternatives (base R and packages). I thought it might help to learn the relative speeds of the alternative functions. So, I created a string vector and called what I hope are equivalent expressions.

sites <- c("http://grand.test.com/", "https://example.com/",  
           "http://.big.time.bhfs.com/", "http://test.blogs.mvalaw.com/")
vec <- rep(x = sites, times = 1000) # creating a longish vector

base <- gsub("http:", "", vec, perl = TRUE)
stringr <- str_replace_all(vec, "http:", replacement = "")
stringi <- stri_replace_all_regex(str = vec, pattern = "http:", replacement = "")
qdap <- genX(text.var = vec, "http:", "")

Then I benchmarked the four methods using the microbenchmarking package.

test <- microbenchmark(base <- gsub("http:", "", vec, perl = TRUE),
                      stringr <- str_replace_all(vec, "http:", replacement = ""),
                      stringi <- stri_replace_all_regex(str = vec, pattern = "http:", replacement = ""),
                      qdap <- genX(text.var = vec, "http:", ""),
                      times = 100)

Am I correct that base R's gsub is by far the fastest (I shortened the expr names)?

 expr        min         lq
 base    1.697001   1.739393
 stringr 3.814348   3.928360
 stringi 5.888857   6.172212
 qdap 120.670037 124.624946
     median         uq        max neval
   1.765051   1.833770   2.976780   100
   3.979453   4.123138   7.032091   100
   6.276407   6.500412   7.634943   100
 127.493293 130.923663 173.155253   100

The median times are very significantly different, especially for qdap

来源：https://stackoverflow.com/questions/24846611/microbenchmarking-base-r-and-three-packages-on-string-pattern-substitution

标签

regex

benchmarking