How can I match overlapping strings with regex?

后端 未结 6 1742
猫巷女王i
猫巷女王i 2020-11-22 03:27

Let\'s say I have the string

\"12345\"

If I .match(/\\d{3}/g), I only get one match, \"123\". Why don\'t I get

6条回答
  •  感情败类
    2020-11-22 03:46

    The string#match with a global flag regex returns an array of matched substrings. The /\d{3}/g regex matches and consumes (=reads into the buffer and advances its index to the position right after the currently matched character) 3 digit sequence. Thus, after "eating up" 123, the index is located after 3, and the only substring left for parsing is 45 - no match here.

    I think the technique used at regex101.com is also worth considering here: use a zero-width assertion (a positive lookahead with a capturing group) to test all positions inside the input string. After each test, the RegExp.lastIndex (it's a read/write integer property of regular expressions that specifies the index at which to start the next match) is advanced "manually" to avoid infinite loop.

    Note it is a technique implemented in .NET (Regex.Matches), Python (re.findall), PHP (preg_match_all), Ruby (String#scan) and can be used in Java, too. Here is a demo using matchAll:

    var re = /(?=(\d{3}))/g;
    console.log( Array.from('12345'.matchAll(re), x => x[1]) );

    Here is an ES5 compliant demo:

    var re = /(?=(\d{3}))/g;
    var str = '12345';
    var m, res = [];
     
    while (m = re.exec(str)) {
        if (m.index === re.lastIndex) {
            re.lastIndex++;
        }
        res.push(m[1]);
    }
    
    console.log(res);

    Here is a regex101.com demo

    Note that the same can be written with a "regular" consuming \d{3} pattern and manually set re.lastIndex to m.index+1 value after each successful match:

    var re = /\d{3}/g;
    var str = '12345';
    var m, res = [];
    
    while (m = re.exec(str)) {
        res.push(m[0]);
        re.lastIndex = m.index + 1; // <- Important
    }
    console.log(res);

提交回复
热议问题