Compilation failed: POSIX collating elements are not supported

前端 未结 2 1505
情话喂你
情话喂你 2021-01-01 05:00

I\'ve just installed a website & legacy CMS onto our server and I\'m getting a POSIX compilation error. Luckily it\'s only appearing in the backend however the client\'s

相关标签:
2条回答
  • 2021-01-01 05:42

    [...] are character classes, they match any character between the brackets, you don't have to add | between them. See character classes.

    So [abcd] will match a or b or c or d.

    If you want to match alternations of more than one character, for example red or blue or yellow, use a sub pattern:

    "(red|blue|yellow)"
    

    And you guessed, [abcd] is equivalent to (a|b|c|d).


    So here is what you could do for your regex:

    For

    $trenner = "[\040|\n|\t|\r]*";
    

    Write this instead:

    $trenner = "[\040\n\t\r]*";
    

    And for

    "[=\"|=\'|=\\\\|=]"
    

    You could do

    "(=\"|=\'|=\\\\|=)"
    

    Or

    "=[\"'\\\\]?"
    

    BTW you could use \s instead of $trenner (see http://www.php.net/manual/en/regexp.reference.escape.php)

    0 讨论(0)
  • 2021-01-01 06:05

    Your error message that “POSIX collating elements are not supported” deserves some explanation. After all, what in the world is a POSIX collating element anyway, and how can I avoid it?

    The short answer is that you have an equals sign inside your square brackets in a place where its use is reserved for future use, assuming we ever get around to implementing it, which is anything but certain. You can tickle this in Perl on the command line this way, which gives a much better error message than PHP is providing:

    % perl -le 'print "abc" =~ /[=foo=]/ || "Fail"'
    POSIX syntax [= =] is reserved for future extensions in regex; marked by <-- HERE in m/[=foo=] <-- HERE / at -e line 1.
    

    That’s the short answer; the longer answer follows.


    Fancy POSIX Character Classes

    Inside a square bracketed character class, POSIX admits three different nestedbracketed forms, all indicated using an extra symbol inside the brackets in pairs:

    1. Named POSIX character classes, which are basically like Unicode properties, use an extra colon flanking: [:PROPERTY:], as in [:alpha:].
    2. Collating elements intended to be treated as equivalent to each other, use an extra equals sign flanking them: [=ELEMENTS=], as in [=eéèëê=] in English or French, and [=vw=] in Swedish.
    3. Polygraphs (digraphs, trigraphs, tetragraphs, etc), which are multicharacter elements meant to count as a single character, have an extra dot flanking them: [.DIGRAPH.], as in [.ch.] or [.ll.] per the traditional Spanish alphabet. These are sometimes known as contractions because two or more code points count as though that sequence were a single code point.

    Perl supports only the first of these, not the second and third.

    They are all awkward to use, because they must be nested inside an extra set of brackets, as in [[:punct:] to mean \pP or \p{punct}. You only need extra braces with Unicode properties when you are selecting one of many, as in [\pL\pN\pM\p{Pc}].

    The Intent

    The other two were an attempt to support locale-specific linguistic elements in a pre‐Unicode enviornment under legacy 8‑bit locales. For example, to express the traditional Spanish alphabet, which counts acute accents over vowels and diaereses over u’s as the same letter yet which counts a tilde over an n as a different letter altogether, and which furthermore has two digraphs each counting as a distinct letter, you would have to write this in POSIX:

    [[=aá=]bc[.ch.]d[=eé=]fgh[=ií=]jkl[.ll.]mnñ[=oó=]pqrst[=uúü=]vwxyz]
    

    You can and sometimes much combine these. For example, in German phonebooks where the three i‑mutated vowels can be spelt without diacritics by inserting a following e:

    [a[=ä[.ae.]=]bcdefghijklmno[=ö[.oe.]=]pqrs[=ß[.ss.]=]tu[=ü[.ue.]=]vwxyz]
    

    That way, assuming $ES and $DE are those languages’ respective alphabets, you could say something like

    [$ES]{4}
    

    and have it match words like guía, niño, llave, and choco in Spanish; or in German have

    [$DE]{6}
    

    and have it match words like tschüß or its uppercase undiacriticked equivalent, TSCHUESS.

    The Unicode Way

    This is awkward for various reasons, and not just those that are obvious from the two alphabets listed above. It does not admit the notion of combining characters, so you have to add those explicitly for non-normalized text, as in [=e\xE9[.e\x{301.]=].

    Unicode has taken another path in how to implement linguistic elements like this. Fortunately, Unicode regular expressions per UTS#18 do not need to support language features tailored for specific languages or locales until Level 3. This is something no one yet has yet implemented.

    Note that having SS and ß have the same casefold is not considered a locale tailoring. It is the full casefold for that code point no matter the linguistic context. So those are the same when case is ignored. Strange but true. Given that ß is code point U+00DF, we see that these are the same no matter the locale:

    $ perl5.14.0 -E 'say "SS" =~ /^\xDF$/i ? "Pass" : "Fail"'
    Pass
    $ perl5.14.0 -E 'say "\xDF" =~ /^SS$/i ? "Pass" : "Fail"'
    Pass
    

    Although locale tailoring for patterns is still beyond us, collation has been implemented, including with locale support, and you can access it from Perl just fine.

    However, PHP does not yet support Unicode collation.


    References for Unicode collation include:

    1. ICU’s Collation Concepts document
    2. UTS#10: Unicode Collation Algorithm
    3. Perl’s Unicode::Collate module.
    4. Perl’s Unicode::Collate::Locale module.
    0 讨论(0)
提交回复
热议问题