What does \x in PHP PCRE mean?

前端 未结 1 1503
春和景丽
春和景丽 2021-01-03 04:03

From the manual:

After \\x, up to two hexadecimal digits are read (letters can be in upper or lower case). In UTF-8 mode, \\x{...

1条回答
  •  一整个雨季
    2021-01-03 04:18

    The syntax is a way to specify a character by value:

    • \xAB specifies a code-point in the range 0-FF.
    • \x{ABCD} specifies a code-point in the range 0-FFFF.

    The indicated wording from the manual is bit confusing, perhaps in an attempt to be precise. Character values 128-255 (and some) are encoded as 2-bytes in UTF-8. Thus, a unicode regular expression will match 7-bit clean ASCII but will not match different encodings/codepages (i.e. CP437) that utilize values in said range. The manual is, in a roundabout way, saying that a unicode regular expression is only suitable to be used with correctly encoded input. However;

    It doesn't mean that \xABCD is parsed as \x{ABCD} (one character). It is parsed as \xAB (one character) and then CD (two characters)1. The braces address this parsing ambiguity issue:

    After \x, up to two hexadecimal digits are read .. In UTF-8 mode, \x{...} is allowed ..

    Some other languages use \u instead of \x for the longer form.


    1 Consider that this matches:

    preg_match('/\xC3A4/u', "\xC3" . "A4");

    0 讨论(0)
提交回复
热议问题