I\'m trying to comment regular expressions in JavaScript.
There seems to be many resources on how to remove comments from code using regex, but not actuall
Unfortunately, JavaScript doesn't have a verbose mode for regular expression literals like some other langauges do. You may find this interesting, though.
In lieu of any external libraries, your best bet is just to use a normal string and comment that:
var r = new RegExp(
'(' + //start capture
'[0-9]+' + // match digit
')' //end capture
);
r.test('9'); //true
In several other languages (notably Perl), there's the special x
flag. When set, the regexp ignores any whitespace and comments inside of it. Sadly, javascript regexps do not support the x
flag.
Lacking syntax, the only way to leverage readability is convention. Mine is to add a comment before the tricky regular expression, containing it as if you've had the x flag. Example:
/*
\+? #optional + sign
(\d*) #the integeric part
( #begin decimal portion
\.
\d+ #decimal part
)
*/
var re = /\+?(\d*)(\.\d+)/;
For more complex examples, you can see what I've done with the technique here and here.
While Javascript doesn't natively support multi-line and commented regular expressions, it's easy enough to construct something that accomplishes the same thing - use a function that takes in a (multi-line, commented) string and returns a regular expression from that string, sans comments and newlines.
The following snippet imitates the behavior of other flavors' x
("extended") flag, which ignores all whitespace characters in a pattern as well as comments, which are denoted with #
:
function makeExtendedRegExp(inputPatternStr, flags) {
// Remove everything between the first unescaped `#` and the end of a line
// and then remove all unescaped whitespace
const cleanedPatternStr = inputPatternStr
.replace(/(^|[^\\])#.*/g, '$1')
.replace(/(^|[^\\])\s+/g, '$1');
return new RegExp(cleanedPatternStr, flags);
}
// The following switches the first word with the second word:
const input = 'foo bar baz';
const pattern = makeExtendedRegExp(String.raw`
^ # match the beginning of the line
(\w+) # 1st capture group: match one or more word characters
\s # match a whitespace character
(\w+) # 2nd capture group: match one or more word characters
`);
console.log(input.replace(pattern, '$2 $1'));
Ordinarily, to represent a backslash in a Javascript string, one must double-escape each literal backslash, eg str = 'abc\\def'
. But regular expressions often use many backslashes, and the double-escaping can make the pattern much less readable, so when writing a Javascript string with many backslashes it's a good idea to use a String.raw
template literal, which allows a single typed backslash to actually represent a literal backslash, without additional escaping.
Just like with the standard x
modifier, to match an actual #
in the string, just escape it first, eg
foo\#bar # comments go here
// this function is exactly the same as the one in the first snippet
function makeExtendedRegExp(inputPatternStr, flags) {
// Remove everything between the first unescaped `#` and the end of a line
// and then remove all unescaped whitespace
const cleanedPatternStr = inputPatternStr
.replace(/(^|[^\\])#.*/g, '$1')
.replace(/(^|[^\\])\s+/g, '$1');
return new RegExp(cleanedPatternStr, flags);
}
// The following switches the first word with the second word:
const input = 'foo#bar baz';
const pattern = makeExtendedRegExp(String.raw`
^ # match the beginning of the line
(\w+) # 1st capture group: match one or more word characters
\# # match a hash character
(\w+) # 2nd capture group: match one or more word characters
`);
console.log(input.replace(pattern, '$2 $1'));
Note that to match a literal space character (and not just any whitespace character), while using the x
flag in any environment (including the above), you have to escape the space with a \
first, eg:
^(\S+)\ (\S+) # capture the first two words
If you want to frequently match space characters, this can get a bit tedious and make the pattern harder to read, similar to how double-escaping backslashes isn't very desirable. One possible (non-standard) modification to permit unescaped space characters would be to only strip out spaces at the beginning and end of a line, and spaces before a #
comment:
function makeExtendedRegExp(inputPatternStr, flags) {
// Remove the first unescaped `#`, any preceeding unescaped spaces, and everything that follows
// and then remove leading and trailing whitespace on each line, including linebreaks
const cleanedPatternStr = inputPatternStr
.replace(/(^|[^\\]) *#.*/g, '$1')
.replace(/^\s+|\s+$|\n/gm, '');
console.log(cleanedPatternStr);
return new RegExp(cleanedPatternStr, flags);
}
// The following switches the first word with the second word:
const input = 'foo bar baz';
const pattern = makeExtendedRegExp(String.raw`
^ # match the beginning of the line
(\w+) (\w+) # capture the first two words
`);
console.log(input.replace(pattern, '$2 $1'));
I would suggest you to put a regular comment above the line with the regular expression in order to explain it.
You will have much more freedom.