What is the purpose of the passive (non-capturing) group in a Javascript regex?

后端 未结 5 1801
渐次进展
渐次进展 2021-01-12 11:05

What is the purpose of the passive group in a Javascript regex?

The passive group is prefaced by a question mark colon: (?:group)

In other words

相关标签:
5条回答
  • 2021-01-12 11:44

    In addition to the answers above, if you're using String.prototype.split() and you use a capturing group, the output array contains the captured results (see MDN). If you use a non-capturing group that doesn't happen.

    var myString = 'Hello 1 word. Sentence number 2.';
    var splits = myString.split(/(\d)/);
    
    console.log(splits);
    

    Outputs:

    ["Hello ", "1", " word. Sentence number ", "2", "."]
    

    Whereas swapping /(\d)/ for /(?:\d)/ results in:

    ["Hello ", " word. Sentence number ", "."]
    
    0 讨论(0)
  • 2021-01-12 11:47

    When you want to apply modifiers to the group.

    /hello (?:world)?/
    /hello (?:world)*/
    /hello (?:world)+/
    /hello (?:world){3,6}/
    

    etc.

    0 讨论(0)
  • 2021-01-12 11:52

    Two use cases for capturing groups

    A capturing group in a regex has actually two distinct goals (as the name "capturing group" itself suggests):

    1. Grouping — if you need a group to be a treated as a single entity in order to apply some stuff to the whole group.

      Probably the most trivial example is including an optional sequence of characters, e.g. "foo" optionally followed by "bar", in regex terms: /foo(bar)?/ (capturing group) or /foo(?:bar)?/ (non-capturing group). Note that the trailing ? is applied to the whole group (bar) (which consists of a simple character sequence bar in this case). In case you just want to check if the input matches your regex, it really doesn't matter whether you use a capturing or a non-capturing group — they act the same (except that a non-capturing group is slightly faster).

    2. Capturing — if you need to extract a part of the input.

      For example, you want to get number of rabbits from an input like "The farm contains 8 cows and 89 rabbits" (not very good English, I know). The regex could be /(\d+)\s*rabbits\b/. On successful match, you can get the value matched by the capturing group from JavaScript code (or any other programming language).

      In this example, you have a single capturing group, so you access it via its index 0 (see this answer for details).

      Now imagine you want to ensure that the "place" is called "farm" or "ranch". If it's not the case, then you don't want to extract the number of rabbits (in regex terms — you don't want the regex to match).

      So you rewrite your regex as follows: /(farm|ranch).*\b(\d+)\s*rabbits\b/. The regex works by itself, but your JavaScript is broken — there are two capturing groups now and you must change your code to get the contents of the second capturing group for the number of rabbits (i.e. change index from 0 to 1). The first group now contains the string "farm" or "ranch", which you didn't intend to extract.

      A non-capturing group comes to rescue: /(?:farm|ranch).*\b(\d+)\s*rabbits\b/. It still matches either "farm" or "ranch", but doesn't capture it, thus not shifting the indexes of subsequent capturing groups. And your JavaScript code works fine without changing.


    The example may be oversimplified, but consider that you have a very complex regex with many groups, and you want to capture only few of them. Non-capturing groups are really helpful then — you don't have to count all of your groups (only capturing ones).

    Besides, non-capturing groups serve documentation purposes: for someone who reads you code, a non-capturing group is an indication that you are not interested in extracting contents, you just want to ensure that it matches.


    A few words on separation of concerns

    Capturing groups are a typical example of breaking the SoC principle. This syntax construct serves two distinct purposes. As the problems herewith became evident, an additional construct (?:) was introduced to disable one of the two features.

    It was just a design mistake. Maybe a lack of "free special characters" played its role... but it was still a poor design.

    Regex is a very old, powerful and widely used concept. For the reasons of backwards compatibility, this flaw is now unlikely to be fixed. It's just a lesson of how important the separation of concerns is.

    0 讨论(0)
  • 2021-01-12 12:01

    Use them when you need a conditional and don't care about which of the choices cause the match.

    Non-capturing groups can simplify the result of matching a complex expression. Here, the group 1 is always the name speaker. Without the non-capturing group, the speaker's name may end up in group 1 or group 2.

    /hello (?:world|foobar )?said (.+)/

    0 讨论(0)
  • 2021-01-12 12:04

    Non-capturing have just one difference from "normal" (capturing) groups: they don't require the regex engine to remember what they matched.

    The use case is that sometimes you must (or should) use a group not because you are interested in what it captures but for syntactic reasons. In these situations it makes sense to use a non-capturing group instead of a "standard" capturing one because it is less resource intensive -- but if you don't care about that, a capturing group will behave in the exact same manner.

    Your specific example does not make a good case for using non-capturing groups exactly because the two expressions are identical. A better example might be:

    input.match(/hello (?:world|there)/)
    
    0 讨论(0)
提交回复
热议问题