I\'ve been using gsub
extensively lately, and I noticed that short patterns run faster than long ones, which is not surprising. Here\'s a fully reproducible cod
The kinks might be related to the bits required to hold patterns of that length.
There is another solution that scales much better, use the repetition operator {}
to specify how many repeats you want to find. In order to find more than 255 (8 bit integer max) you'll have to specify perl = TRUE
.
patt2 <- paste0('a{',rpt[n],'}')
timeRF <- microbenchmark(gsub(patt2, "b", inp, perl = T), times = 10)
I get speeds of around 2.1 ms per search with no penalty for pattern length. That's about 8x faster than fixed = FALSE for small pattern lengths and about 60x faster for large pattern lengths.