How to get possibly overlapping matches in a string

后端 未结 8 2116
别那么骄傲
别那么骄傲 2020-12-03 17:13

I\'m looking for a way, either in Ruby or Javascript, that will give me all matches, possibly overlapping, within a string against a regexp.


Let\'s say I have

相关标签:
8条回答
  • 2020-12-03 17:53

    This JavaScript approach offers an advantage over Wiktor's answer by lazily iterating the substrings of a given string using a generator function, which allows you to consume a single match at a time for very large input strings using a for...of loop, rather than generating a whole array of matches at once, which could lead to out-of-memory exceptions since the amount of substrings for a string grows quadratically with length:

    function * substrings (str) {
      for (let length = 1; length <= str.length; length++) {
        for (let i = 0; i <= str.length - length; i++) {
          yield str.slice(i, i + length);
        }
      }
    }
    
    function * matchSubstrings (str, re) {
      const subre = new RegExp(`^${re.source}$`, re.flags);
      
      for (const substr of substrings(str)) {
        if (subre.test(substr)) yield substr;
      }
    }
    
    for (const match of matchSubstrings('abcabc', /a.*c/)) {
      console.log(match);
    }

    0 讨论(0)
  • 2020-12-03 17:55
    ▶ str = "abcadc"
    ▶ from = str.split(/(?=\p{L})/).map.with_index { |c, i| i if c == 'a' }.compact
    ▶ to   = str.split(/(?=\p{L})/).map.with_index { |c, i| i if c == 'c' }.compact
    ▶ from.product(to).select { |f,t| f < t }.map { |f,t| str[f..t] }
    #⇒ [
    #  [0] "abc",
    #  [1] "abcadc",
    #  [2] "adc"
    # ]
    

    I believe, that there is a fancy way to find all indices of a character in a string, but I was unable to find it :( Any ideas?

    Splitting on “unicode char boundary” makes it to work with strings like 'ábĉ' or 'Üve Østergaard'.

    For more generic solution, that accepts any “from” and “to” sequences, one should introduce just a little modification: find all indices of “from” and “to” in the string.

    0 讨论(0)
提交回复
热议问题