How to get possibly overlapping matches in a string

后端 未结 8 2115
别那么骄傲
别那么骄傲 2020-12-03 17:13

I\'m looking for a way, either in Ruby or Javascript, that will give me all matches, possibly overlapping, within a string against a regexp.


Let\'s say I have

相关标签:
8条回答
  • 2020-12-03 17:31

    Approach of RegExp /(a.c)|(a.*c)/g is to match "a" character followed by any character followed by "c" ; "a.*c" is to match "a" followed by any character followed by preceding character followed by "c" character ; note RegExp at (a.*c) could probably be improved. Condition at if checks if last character in input string is "c" , if true , push full input string to res results array

    var str = "abcadc"
    , res = str.match(/(a.c)|(a.*c)/g); 
    if (str[str.length - 1] === "c") res.push(str);
    
    document.body.textContent = res.join(" ")

    0 讨论(0)
  • 2020-12-03 17:35

    Here's an approach that is similar to @ndn's and @Mark's that works with any string and regex. I've implemented this as a method of String because that's where I'd like to see it. Wouldn't it be a great compliment to String#[] and String#scan?

    class String
      def all_matches(regex)
        return [] if empty?
        r = /^#{regex}$/
        1.upto(size).with_object([]) { |i,a|
          a.concat(each_char.each_cons(i).map(&:join).select { |s| s =~ r }) }
      end
    end
    
    'abcadc'.all_matches /a.*c/
      # => ["abc", "abcadc", "adc"]
    'aaabaaa'.all_matches(/a.*a/)
      #=> ["aa", "aa", "aa", "aa", "aaa", "aba", "aaa", "aaba", "abaa", "aaaba",
      #    "aabaa", "abaaa", "aaabaa", "aabaaa", "aaabaaa"] 
    
    0 讨论(0)
  • 2020-12-03 17:43

    You want all possible matches, including overlapping ones. As you've noted, the lookahead trick from "How to find overlapping matches with a regexp?" doesn't work for your case.

    The only thing I can think of that will work in the general case is to generate all of the possible substrings of the string and check each one against an anchored version of the regex. This is brute-force, but it works.

    Ruby:

    def all_matches(str, regex)
      (n = str.length).times.reduce([]) do |subs, i|
         subs += [*i..n].map { |j| str[i,j-i] }
      end.uniq.grep /^#{regex}$/
    end
    
    all_matches("abcadc", /a.*c/) 
    #=> ["abc", "abcadc", "adc"]
    

    Javascript:

    function allMatches(str, regex) {
      var i, j, len = str.length, subs={};
      var anchored = new RegExp('^' + regex.source + '$');
      for (i=0; i<len; ++i) {
        for (j=i; j<=len; ++j) {
           subs[str.slice(i,j)] = true;
        }
      }
      return Object.keys(subs).filter(function(s) { return s.match(anchored); });
    }
    
    0 讨论(0)
  • 2020-12-03 17:48

    In JS:

    function doit(r, s) {
      var res = [], cur;
      r = RegExp('^(?:' + r.source + ')$', r.toString().replace(/^[\s\S]*\/(\w*)$/, '$1'));
      r.global = false;
      for (var q = 0; q < s.length; ++q)
        for (var w = q; w <= s.length; ++w)
          if (r.test(cur = s.substring(q, w)))
            res.push(cur);
      return res;
    }
    document.body.innerHTML += "<pre>" + JSON.stringify(doit( /a.*c/g, 'abcadc' ), 0, 4) + "</pre>";

    0 讨论(0)
  • 2020-12-03 17:51

    In Ruby you could achieve the expected result using:

    str = "abcadc"
    [/(a[^c]*c)/, /(a.*c)/].flat_map{ |pattern| str.scan(pattern) }.reduce(:+)
    # => ["abc", "adc", "abcadc"]
    

    Whether this way works for you is highly dependent on what you really want to achieve.

    I tried to put this into a single expression but I couldn't make it work. I would really like to know if there is some scientific reason this cannot be parsed by regular expressions or if I just don't know enough about Ruby's parser Oniguruma to do it.

    0 讨论(0)
  • 2020-12-03 17:53
    def matching_substrings(string, regex)
      string.size.times.each_with_object([]) do |start_index, maching_substrings|
        start_index.upto(string.size.pred) do |end_index|
          substring = string[start_index..end_index]
          maching_substrings.push(substring) if substring =~ /^#{regex}$/
        end
      end
    end
    
    matching_substrings('abcadc', /a.*c/) # => ["abc", "abcadc", "adc"]
    matching_substrings('foobarfoo', /(\w+).*\1/) 
      # => ["foobarf",
      #     "foobarfo",
      #     "foobarfoo",
      #     "oo",
      #     "oobarfo",
      #     "oobarfoo",
      #     "obarfo",
      #     "obarfoo",
      #     "oo"]
    matching_substrings('why is this downvoted?', /why.*/)
      # => ["why",
      #     "why ",
      #     "why i",
      #     "why is",
      #     "why is ",
      #     "why is t",
      #     "why is th",
      #     "why is thi",
      #     "why is this",
      #     "why is this ",
      #     "why is this d",
      #     "why is this do",
      #     "why is this dow",
      #     "why is this down",
      #     "why is this downv",
      #     "why is this downvo",
      #     "why is this downvot",
      #     "why is this downvote",
      #     "why is this downvoted",
      #     "why is this downvoted?"]
    
    0 讨论(0)
提交回复
热议问题