gsub speed vs pattern length

前端 未结 1 521
南方客
南方客 2021-01-07 17:15

I\'ve been using gsub extensively lately, and I noticed that short patterns run faster than long ones, which is not surprising. Here\'s a fully reproducible cod

相关标签:
1条回答
  • 2021-01-07 17:44

    The kinks might be related to the bits required to hold patterns of that length.

    There is another solution that scales much better, use the repetition operator {} to specify how many repeats you want to find. In order to find more than 255 (8 bit integer max) you'll have to specify perl = TRUE.

    patt2 <- paste0('a{',rpt[n],'}')
    timeRF <- microbenchmark(gsub(patt2, "b", inp, perl = T), times = 10)
    

    I get speeds of around 2.1 ms per search with no penalty for pattern length. That's about 8x faster than fixed = FALSE for small pattern lengths and about 60x faster for large pattern lengths.

    0 讨论(0)
提交回复
热议问题