Matching all three kinds of PHP comments with REGEX

青春壹個敷衍的年華 提交于 2019-12-29 01:45:33

问题


I'm new to REGEX and I need some help.

I need to match all three types of comments that PHP might have:
# Single line comment
// Single line comment
/* Multi-line comments */

/**
 * And all of it's possible variations
 */

Something I should mention, I am doing this in order to be able to recognize if a PHP closing tag (?>) is inside a comment or not, if it is then ignore it, if not then make it count as one. This is gonna be used inside an XML document in order to improve Sublime Text's recognition of the closing tag (cause it's driving me nuts!). I tried to achieve this a couple hours but wasn't able, so if you could translate for it to work with XML I would appreciate it. :)

So if you could also include the if-then-else login I would really appreciate it. BTW, I really need it to be in pure REGEX expression, no language features or anything. :)

Like Eicon reminded me, I need all of them to be able to match at the start of the line, or at the end of a piece of code, so I also need the following with all of them:

<?php
echo 'something'; # this is a comment
?>

Any help would be appreciated. :)


回答1:


Parsing a programming language seems too much for regexes to do. You should probably look for a PHP parser.

But these would be the regexes you are looking for. I assume for all of them that you use the DOTALL or SINGLELINE option (although the first two would work without it as well):

~#[^\r\n]*~
~//[^\r\n]*~
~/\*.*?\*/~s

Note that any of these will cause problems, if the comment-delimiting characters appear in a string or somewhere else, where they do not actually open a comment.

You can also combine all of these into one regex:

~(?:#|//)[^\r\n]*|/\*.*?\*/~s

If you use some tool or language that does not require delimiters (like Java or C#), remove those ~. In this case you will also have to apply the DOTALL option differently. But without knowing where you are going to use this, I cannot tell you how.

If you cannot/do not want to set the DOTALL option, this would be equivalent (I also left out the delimiters to give an example):

(?:#|//)[^\r\n]*|/\*[\s\S]*?\*/

See here for a working demo.

Now if you also want to capture the contents of the comments in a group, then you could do this

(?|(?:#|//)([^\r\n]*)|/\*([\s\S]*?)\*/)

Regardless of the type of comment, the comments content (without the syntax delimiters) will be found in capture 1.

Another working demo.




回答2:


Old question, but maybe this would help somebody else...

Single line comments

singleLineComment = /'[^']*'|"[^"]*"|((?:#|\/\/).*$)/gm

With this regex you have to replace (or remove) everything that was captured by ((?:#|\/\/).*$). This regex will ignore contents of strings that would look like comments (e.g. $x = "You are the #1"; or $y = "You can start comments with // or # in PHP, but I'm a code string";)

Multiline comments

 multilineComment = /^\s*\/\*\*?[^!][.\s\t\S\n\r]*?\*\//gm


来源:https://stackoverflow.com/questions/13114104/matching-all-three-kinds-of-php-comments-with-regex

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!