Javascript Regex Match Capture is returning whole match, not group

前端 未结 3 1340
心在旅途
心在旅途 2020-12-18 20:24
re = /\\s{1,}(male)\\.$/gi

\"A girl is a female, and a boy is a male.\".match(re);

this results in \" male.\"

what i want is \"male\"

相关标签:
3条回答
  • 2020-12-18 20:55

    You need to take out the 'g' option on your regexp:

    re = /\s{1,}(male)\.$/i
    

    yields

    [" male.", "male"]
    
    0 讨论(0)
  • 2020-12-18 20:58

    In String.prototype.match(), captured groups are not returned.

    If you need the capture groups use RegExp.prototype.exec(). It will return an array, first element will be the whole match, and next elements will be capture the capture groups.

    var regexObj = /\s{1,}(male)\.$/gi;
    
    console.log(regexObj.exec('A girl is a female, and a boy is a male.'));
    

    Will output:

    [' male.', 'male'] // Second element is your capture group

    0 讨论(0)
  • 2020-12-18 21:12

    I know that this question is very old but all the answers here are just plain wrong. What really bugs me is that the answers don't add anything useful to the community.

    First

    Question: Why does the regex result in " male."?

    re = /\s{1,}(male)\.$/gi
    
    "A girl is a female, and a boy is a male.".match(re);
    

    Answer: Because, " male." is the only match.

    Question: Why didn't (male) get returned?

    Answer: Because captured groups are not returned by match() when the g flag is used.

    From the dcoumentation:

    If the regular expression includes the g flag, the method returns an Array containing all matched substrings rather than match objects. Captured groups are not returned. If there were no matches, the method returns null.

    Second

    Let's break down the regex and figure out what pattern it's really matching.

    patterns

    • \s{1,} means match at least one white-space. This is the same as \s+.
    • (male) means match male and capture it.
    • \.$ means match a period at the end of the input.

    flags

    • g means find all matches rather than stopping after the first match
    • i means ignore case

    However, all of those patterns are stuck together. Those patterns do not stand by themselves.

    What the regex is matching is: one space followed by "male" followed by a . at the end of the input. In the example the only portion of the input that matches is " male.".

    Third

    So, what happens when we remove the g flag?

    If the string matches the expression, it will return an Array containing the entire matched string as the first element, followed by any results captured in parentheses. If there were no matches, null is returned.

    If the regular expression does not include the g flag, str.match() will return the same result as RegExp.exec(). The returned Array has an extra input property, which contains the original string that was parsed. In addition, it has an index property, which represents the zero-based index of the match in the string.

    re = /\s{1,}(male)\.$/i
    
    "A girl is a female, and a boy is a male.".match(re);
    

    The new result is an array with some extra properties: index and input.

    res: Array(2)
        0 : " male."
        1 : "male"
        groups : undefined
        index : 34
        input : "A girl is a female, and a boy is a male."
        length : 2
    

    It's easy to manipulate that result to get what you wanted. However ....

    Fourth

    I really, really, really wanted the regex to only return "male". Guess what, you can really, really, really do that with pure regex.

    re = /male(?=\.$)(?!=[^\b])/gi
    
    
    "A girl is a female, and a boy is a male.".match(re);
    

    This results in "male"; exactly what the questioner asked for.

    Notice that the g flag is back? It makes no difference in this example, but it will later.

    Let's break it down:

    • male matches male; duh.
    • (?=\.$) means match the previous pattern only if it's followed by a . at the end of the input.
    • (?!=[^\b]) means match the previous pattern if it's preceded by a white-space character.

    Put it all together and male(?=\.$)(?!=[^\b]) means match male if it's followed by a period at the end of the input and match male if it's preceded by a white-space character.

    FINALLY

    What about that g flag? Can we see it do something?

    As previous user's said, the \.$ makes the g flag irrelevant because there can only be one end of input character; irrelevant for matching that is because we see that it affects the output of macth().

    What if we changed the input to A girl is a female, and a boy is a male. A female likes a good male.

    Get rid of the $ and see the g flag work it's magic.

    re = /male(?=\.)(?!=[^\b])/ig
    
    res = "A girl is a female, and a boy is a male. A female likes a good male.".match(re);
    

    Now, the output is an array with just matches! ['male','male'].

    I feel better now.

    0 讨论(0)
提交回复
热议问题