I need to remove all /*...*/
style comments from JSON data. How do I do it with regular expressions so that string values like this
{
\"propName
You must first avoid all the content that is inside double quotes using the backtrack control verbs SKIP and FAIL (or a capture)
$string = <<<'LOD'
{
"propName": "Hello \" /* don't remove **/ there." /*this must be removed*/
}
LOD;
$result = preg_replace('~"(?:[^\\\"]+|\\\.)*+"(*SKIP)(*FAIL)|/\*(?:[^*]+|\*+(?!/))*+\*/~s', '',$string);
// The same with a capture:
$result = preg_replace('~("(?:[^\\\"]+|\\\.)*+")|/\*(?:[^*]+|\*+(?!/))*+\*/~s', '$1',$string);
Pattern details:
"(?:[^\\\"]+|\\\.)*+"
This part describe the possible content inside quotes:
" # literal quote
(?: # open a non-capturing group
[^\\\"]+ # all characters that are not \ or "
| # OR
\\\.)*+ # escaped char (that can be a quote)
"
Then You can make this subpattern fails with (*SKIP)(*FAIL)
or (*SKIP)(?!)
. The SKIP forbid the backtracking before this point if the pattern fails after. FAIL forces the pattern to fail. Thus, quoted part are skipped (and can't be in the result since you make the subpattern fail after).
Or you use a capturing group and you add the reference in the replacement pattern.
/\*(?:[^*]+|\*+(?!/))*+\*/
This part describe content inside comments.
/\* # open the comment
(?:
[^*]+ # all characters except *
| # OR
\*+(?!/) # * not followed by / (note that you can't use
# a possessive quantifier here)
)*+ # repeat the group zero or more times
\*/ # close the comment
The s modifier is used here only when a backslash is before a newline inside quotes.