问题
turns out that both of these sequences (previously working)
"`([\n\A;]+)\/\*(.+?)\*\/`ism" => "$1", // error
"`([\n\A;\s]+)//(.+?)[\n\r]`ism" =>"$1\n", // error
Now throw an error in PHP 7.3
Warning: preg_replace(): Compilation failed: escape sequence is invalid in character class offset 4
CONTEXT: consider this snipit, which removes CSS comments from a string
$buffer = ".selector {color:#fff; } /* some comment to remove*/";
$regex = array(
"`^([\t\s]+)`ism"=>'',
"`^\/\*(.+?)\*\/`ism"=>"",
"`([\n\A;]+)\/\*(.+?)\*\/`ism"=>"$1", // 7.3 error
"`([\n\A;\s]+)//(.+?)[\n\r]`ism"=>"$1\n", // 7.3 error
"`(^[\r\n]*|[\r\n]+)[\s\t]*[\r\n]+`ism"=>"\n"
);
$buffer = preg_replace(array_keys($regex),$regex,$buffer);
//returns cleaned up $buffer value with pure css and no comments
Refer to: https://stackoverflow.com/a/1581063/1293658
Q1 - Any ideas whats wrong with the REGEX in this case? This thread seems to suggest it's simply a misplaced backslash https://github.com/thujohn/twitter/issues/250
Q2 - Is this a PHP 7.3 bug or a problem with the REGEX sequence in this code?
回答1:
Do not use zero-width assertions inside character classes.
^
,$
,\A
,\b
,\B
,\Z
,\z
,\G
- as anchors, (non-)word boundaries - do not make sense inside character classes since they do not match any character. The^
and\b
mean something different in the character class:^
is either the negated character class mark if used after the open[
or denotes a literal^
.\b
means a backspace char.You can't use
\R
(=any line break) there, neither.
The two patterns with \A
inside a character class must be re-written as a grouping construct, (...)
, with an alternation operator |
:
"`(\A|[\n;]+)/\*.+?\*/`s"=>"$1",
"`(\A|[;\s]+)//.+\R`"=>"$1\n",
I removed the redundant modifiers and capturing groups you are not using, and replaced [\r\n]
with \R
. The "`(\A|[\n;]+)/\*.+?\*/`s"=>"$1"
can also be re-written in a more efficient way:
"`(\A|[\n;]+)/\*[^*]*\*+(?:[^/*][^*]*\*+)*/`"=>"$1"
Note that in PHP 7.3, acc. to the Upgrade history of the bundled PCRE library table, the regex library is PCRE 10.32. See PCRE to PCRE2 migration:
Until PHP 7.2, PHP used the 8.x versions of the legacy PCRE library, and from PHP 7.3, PHP will use PCRE2. Note that PCRE2 is considered to be a new library although it's based on and largely compatible with PCRE (8.x).
Acc. to this resource, the updated library is more strict to regex patterns, and treats former leniently accepted user errors as real errors now:
- Modifier S is now on by default. PCRE does some extra optimization.
- Option X is disabled by default. It makes PCRE do more syntax validation than before.
- Unicode 10 is used, while it was Unicode 7. This means more emojis, more characters, and more sets. Unicode regex may be impacted.
- Some invalid patterns may be impacted.
In simple words, PCRE2 is more strict in the pattern validations, so after the upgrade, some of your existing patterns could not compile anymore.
来源:https://stackoverflow.com/questions/57829977/error-when-removing-css-comments-via-regex