Commenting Regular Expressions

后端 未结 4 1285
忘掉有多难
忘掉有多难 2020-11-29 11:14

I\'m trying to comment regular expressions in JavaScript.

There seems to be many resources on how to remove comments from code using regex, but not actuall

相关标签:
4条回答
  • 2020-11-29 11:50

    Unfortunately, JavaScript doesn't have a verbose mode for regular expression literals like some other langauges do. You may find this interesting, though.

    In lieu of any external libraries, your best bet is just to use a normal string and comment that:

    var r = new RegExp(
        '('      + //start capture
        '[0-9]+' + // match digit
        ')'        //end capture
    ); 
    r.test('9'); //true
    
    0 讨论(0)
  • 2020-11-29 11:51

    In several other languages (notably Perl), there's the special x flag. When set, the regexp ignores any whitespace and comments inside of it. Sadly, javascript regexps do not support the x flag.

    Lacking syntax, the only way to leverage readability is convention. Mine is to add a comment before the tricky regular expression, containing it as if you've had the x flag. Example:

    /*
      \+?     #optional + sign
      (\d*)   #the integeric part
      (       #begin decimal portion
         \.
         \d+  #decimal part
      )
     */
    var re = /\+?(\d*)(\.\d+)/;
    

    For more complex examples, you can see what I've done with the technique here and here.

    0 讨论(0)
  • 2020-11-29 11:59

    While Javascript doesn't natively support multi-line and commented regular expressions, it's easy enough to construct something that accomplishes the same thing - use a function that takes in a (multi-line, commented) string and returns a regular expression from that string, sans comments and newlines.

    The following snippet imitates the behavior of other flavors' x ("extended") flag, which ignores all whitespace characters in a pattern as well as comments, which are denoted with #:

    function makeExtendedRegExp(inputPatternStr, flags) {
      // Remove everything between the first unescaped `#` and the end of a line
      // and then remove all unescaped whitespace
      const cleanedPatternStr = inputPatternStr
        .replace(/(^|[^\\])#.*/g, '$1')
        .replace(/(^|[^\\])\s+/g, '$1');
      return new RegExp(cleanedPatternStr, flags);
    }
    
    
    // The following switches the first word with the second word:
    const input = 'foo bar baz';
    const pattern = makeExtendedRegExp(String.raw`
      ^       # match the beginning of the line
      (\w+)   # 1st capture group: match one or more word characters
      \s      # match a whitespace character
      (\w+)   # 2nd capture group: match one or more word characters
    `);
    console.log(input.replace(pattern, '$2 $1'));

    Ordinarily, to represent a backslash in a Javascript string, one must double-escape each literal backslash, eg str = 'abc\\def'. But regular expressions often use many backslashes, and the double-escaping can make the pattern much less readable, so when writing a Javascript string with many backslashes it's a good idea to use a String.raw template literal, which allows a single typed backslash to actually represent a literal backslash, without additional escaping.

    Just like with the standard x modifier, to match an actual # in the string, just escape it first, eg

    foo\#bar     # comments go here
    

    // this function is exactly the same as the one in the first snippet
    
    function makeExtendedRegExp(inputPatternStr, flags) {
      // Remove everything between the first unescaped `#` and the end of a line
      // and then remove all unescaped whitespace
      const cleanedPatternStr = inputPatternStr
        .replace(/(^|[^\\])#.*/g, '$1')
        .replace(/(^|[^\\])\s+/g, '$1');
      return new RegExp(cleanedPatternStr, flags);
    }
    
    
    // The following switches the first word with the second word:
    const input = 'foo#bar baz';
    const pattern = makeExtendedRegExp(String.raw`
      ^       # match the beginning of the line
      (\w+)   # 1st capture group: match one or more word characters
      \#      # match a hash character
      (\w+)   # 2nd capture group: match one or more word characters
    `);
    console.log(input.replace(pattern, '$2 $1'));

    Note that to match a literal space character (and not just any whitespace character), while using the x flag in any environment (including the above), you have to escape the space with a \ first, eg:

    ^(\S+)\ (\S+)   # capture the first two words
    

    If you want to frequently match space characters, this can get a bit tedious and make the pattern harder to read, similar to how double-escaping backslashes isn't very desirable. One possible (non-standard) modification to permit unescaped space characters would be to only strip out spaces at the beginning and end of a line, and spaces before a # comment:

    function makeExtendedRegExp(inputPatternStr, flags) {
      // Remove the first unescaped `#`, any preceeding unescaped spaces, and everything that follows
      // and then remove leading and trailing whitespace on each line, including linebreaks
      const cleanedPatternStr = inputPatternStr
        .replace(/(^|[^\\]) *#.*/g, '$1')
        .replace(/^\s+|\s+$|\n/gm, '');
      console.log(cleanedPatternStr);
      return new RegExp(cleanedPatternStr, flags);
    }
    
    
    // The following switches the first word with the second word:
    const input = 'foo bar baz';
    const pattern = makeExtendedRegExp(String.raw`
      ^             # match the beginning of the line
      (\w+) (\w+)   # capture the first two words
    `);
    console.log(input.replace(pattern, '$2 $1'));

    0 讨论(0)
  • 2020-11-29 12:02

    I would suggest you to put a regular comment above the line with the regular expression in order to explain it.

    You will have much more freedom.

    0 讨论(0)
提交回复
热议问题