Javascript Regex - Find all possible matches, even in already captured matches

后端 未结 3 1578
余生分开走
余生分开走 2020-12-03 14:26

I\'m trying to obtain all possible matches from a string using regex with javascript. It appears that my method of doing this is not matching parts of the string

相关标签:
3条回答
  • 2020-12-03 14:52

    Without modifying your regex, you can set it to start matching at the beginning of the second half of the match after each match using .exec and manipulating the regex object's lastIndex property.

    var string = 'A1B1Y:A1B2Y:A1B3Y:A1B4Z:A1B5Y:A1B6Y:A1B7Y:A1B8Z:A1B9Y:A1B10Y:A1B11Y';
    var reg = /A[0-9]+B[0-9]+Y:A[0-9]+B[0-9]+Y/g;
    var matches = [], found;
    while (found = reg.exec(string)) {
        matches.push(found[0]);
        reg.lastIndex -= found[0].split(':')[1].length;
    }
    
    console.log(matches);
    //["A1B1Y:A1B2Y", "A1B2Y:A1B3Y", "A1B5Y:A1B6Y", "A1B6Y:A1B7Y", "A1B9Y:A1B10Y", "A1B10Y:A1B11Y"]
    

    Demo


    As per Bergi's comment, you can also get the index of the last match and increment it by 1 so it instead of starting to match from the second half of the match onwards, it will start attempting to match from the second character of each match onwards:

    reg.lastIndex = found.index+1;
    

    Demo

    The final outcome is the same. Though, Bergi's update has a little less code and performs slightly faster. =]

    0 讨论(0)
  • 2020-12-03 15:02

    Unfortunately, it's not quite as simple as a single string.match.

    The reason is that you want overlapping matches, which the /g flag doesn't give you.

    You could use lookahead:

    var re = /A\d+B\d+Y(?=:A\d+B\d+Y)/g;
    

    But now you get:

    string.match(re); // ["A1B1Y", "A1B2Y", "A1B5Y", "A1B6Y", "A1B9Y", "A1B10Y"]
    

    The reason is that lookahead is zero-width, meaning that it just says whether the pattern comes after what you're trying to match or not; it doesn't include it in the match.

    You could use exec to try and grab what you want. If a regex has the /g flag, you can run exec repeatedly to get all the matches:

    // using re from above to get the overlapping matches
    
    var m;
    var matches = [];
    var re2 = /A\d+B\d+Y:A\d+B\d+Y/g; // make another regex to get what we need
    
    while ((m = re.exec(string)) !== null) {
      // m is a match object, which has the index of the current match
      matches.push(string.substring(m.index).match(re2)[0]);
    }
    
    matches == [
      "A1B1Y:A1B2Y", 
      "A1B2Y:A1B3Y", 
      "A1B5Y:A1B6Y", 
      "A1B6Y:A1B7Y", 
      "A1B9Y:A1B10Y", 
      "A1B10Y:A1B11Y"
    ];
    

    Here's a fiddle of this in action. Open up the console to see the results

    Alternatively, you could split the original string on :, then loop through the resulting array, pulling out the the ones that match when array[i] and array[i+1] both match like you want.

    0 讨论(0)
  • 2020-12-03 15:13

    You cannot get the direct result from match, but it is possible to produce the result via RegExp.exec and with some modification to the regex:

    var regex = /A[0-9]+B[0-9]+Y(?=(:A[0-9]+B[0-9]+Y))/g;
    var input = 'A1B1Y:A1B2Y:A1B3Y:A1B4Z:A1B5Y:A1B6Y:A1B7Y:A1B8Z:A1B9Y:A1B10Y:A1B11Y'
    var arr;
    var results = [];
    
    while ((arr = regex.exec(input)) !== null) {
        results.push(arr[0] + arr[1]);
    }
    

    I used zero-width positive look-ahead (?=pattern) in order not to consume the text, so that the overlapping portion can be rematched.

    Actually, it is possible to abuse replace method to do achieve the same result:

    var input = 'A1B1Y:A1B2Y:A1B3Y:A1B4Z:A1B5Y:A1B6Y:A1B7Y:A1B8Z:A1B9Y:A1B10Y:A1B11Y'
    var results = [];
    
    input.replace(/A[0-9]+B[0-9]+Y(?=(:A[0-9]+B[0-9]+Y))/g, function ($0, $1) {
        results.push($0 + $1);
        return '';
    });
    

    However, since it is replace, it does extra useless replacement work.

    0 讨论(0)
提交回复
热议问题