Commenting Regular Expressions

后端未结

关注

 4  1285

忘掉有多难

I\'m trying to comment regular expressions in JavaScript.

There seems to be many resources on how to remove comments from code using regex, but not actuall

相关标签:

4条回答

独厮守ぢ

2020-11-29 11:50
Unfortunately, JavaScript doesn't have a verbose mode for regular expression literals like some other langauges do. You may find this interesting, though.

In lieu of any external libraries, your best bet is just to use a normal string and comment that:
```
var r = new RegExp(
    '('      + //start capture
    '[0-9]+' + // match digit
    ')'        //end capture
); 
r.test('9'); //true
```
0 讨论(0)
发布评论:

提交评论
- 加载中...
天涯浪人

2020-11-29 11:51
In several other languages (notably Perl), there's the special x flag. When set, the regexp ignores any whitespace and comments inside of it. Sadly, javascript regexps do not support the x flag.

Lacking syntax, the only way to leverage readability is convention. Mine is to add a comment before the tricky regular expression, containing it as if you've had the x flag. Example:
```
/*
  \+?     #optional + sign
  (\d*)   #the integeric part
  (       #begin decimal portion
     \.
     \d+  #decimal part
  )
 */
var re = /\+?(\d*)(\.\d+)/;
```
For more complex examples, you can see what I've done with the technique here and here.
0 讨论(0)
发布评论:

提交评论
- 加载中...

逝去的感伤

2020-11-29 11:59

While Javascript doesn't natively support multi-line and commented regular expressions, it's easy enough to construct something that accomplishes the same thing - use a function that takes in a (multi-line, commented) string and returns a regular expression from that string, sans comments and newlines.

The following snippet imitates the behavior of other flavors' x ("extended") flag, which ignores all whitespace characters in a pattern as well as comments, which are denoted with #:

function makeExtendedRegExp(inputPatternStr, flags) {
  // Remove everything between the first unescaped `#` and the end of a line
  // and then remove all unescaped whitespace
  const cleanedPatternStr = inputPatternStr
    .replace(/(^|[^\\])#.*/g, '$1')
    .replace(/(^|[^\\])\s+/g, '$1');
  return new RegExp(cleanedPatternStr, flags);
}


// The following switches the first word with the second word:
const input = 'foo bar baz';
const pattern = makeExtendedRegExp(String.raw`
  ^       # match the beginning of the line
  (\w+)   # 1st capture group: match one or more word characters
  \s      # match a whitespace character
  (\w+)   # 2nd capture group: match one or more word characters
`);
console.log(input.replace(pattern, '$2 $1'));

Ordinarily, to represent a backslash in a Javascript string, one must double-escape each literal backslash, eg str = 'abc\\def'. But regular expressions often use many backslashes, and the double-escaping can make the pattern much less readable, so when writing a Javascript string with many backslashes it's a good idea to use a String.raw template literal, which allows a single typed backslash to actually represent a literal backslash, without additional escaping.

Just like with the standard x modifier, to match an actual # in the string, just escape it first, eg

foo\#bar     # comments go here

// this function is exactly the same as the one in the first snippet

function makeExtendedRegExp(inputPatternStr, flags) {
  // Remove everything between the first unescaped `#` and the end of a line
  // and then remove all unescaped whitespace
  const cleanedPatternStr = inputPatternStr
    .replace(/(^|[^\\])#.*/g, '$1')
    .replace(/(^|[^\\])\s+/g, '$1');
  return new RegExp(cleanedPatternStr, flags);
}


// The following switches the first word with the second word:
const input = 'foo#bar baz';
const pattern = makeExtendedRegExp(String.raw`
  ^       # match the beginning of the line
  (\w+)   # 1st capture group: match one or more word characters
  \#      # match a hash character
  (\w+)   # 2nd capture group: match one or more word characters
`);
console.log(input.replace(pattern, '$2 $1'));

Note that to match a literal space character (and not just any whitespace character), while using the x flag in any environment (including the above), you have to escape the space with a \ first, eg:

^(\S+)\ (\S+)   # capture the first two words

If you want to frequently match space characters, this can get a bit tedious and make the pattern harder to read, similar to how double-escaping backslashes isn't very desirable. One possible (non-standard) modification to permit unescaped space characters would be to only strip out spaces at the beginning and end of a line, and spaces before a # comment:

function makeExtendedRegExp(inputPatternStr, flags) {
  // Remove the first unescaped `#`, any preceeding unescaped spaces, and everything that follows
  // and then remove leading and trailing whitespace on each line, including linebreaks
  const cleanedPatternStr = inputPatternStr
    .replace(/(^|[^\\]) *#.*/g, '$1')
    .replace(/^\s+|\s+$|\n/gm, '');
  console.log(cleanedPatternStr);
  return new RegExp(cleanedPatternStr, flags);
}


// The following switches the first word with the second word:
const input = 'foo bar baz';
const pattern = makeExtendedRegExp(String.raw`
  ^             # match the beginning of the line
  (\w+) (\w+)   # capture the first two words
`);
console.log(input.replace(pattern, '$2 $1'));

0 讨论(0)

深忆病人

2020-11-29 12:02

I would suggest you to put a regular comment above the line with the regular expression in order to explain it.

You will have much more freedom.

0 讨论(0)
发布评论:

提交评论
- 加载中...