Perl Unicode internals - mess with utf8

后端 未结 2 1017
忘了有多久
忘了有多久 2021-02-04 12:53

Before anyone will tells me to RTFM, I must say - I have digged through:

  • Why does modern Perl avoid UTF-8 by default?
  • Checklist for going the Unicode way
2条回答
  •  执念已碎
    2021-02-04 13:34

    HOW i can ensure (test it), than any $other_data contains valid unicode string?

    You cannot determine ex post facto whether a string has character semantics or byte semantics. Perl does not track this for you. You have to track it by careful programming: encode and decode at the boundaries; :raw layer for byte semantics, :encoding(foo) for character semantics. Employ naming conventions for your variables and functions to clearly differentiate between the semantics and make wrong code look wrong.

    for what purpose is the utf8::is_utf8($data)?

    It tells you the presence of the SvUTF8 flag, nothing more. This is almost entirely useless for most developers, because it is an internals thing. The flag does not mean that a string has character semantics, its absence does not mean that a string has byte semantics.

    The whole utf8 pragma is a mystery for me.

    Probably because it is overdocumented, and therefore confusing. Most developers can stop reading after the part where is says that its purpose is to enable Unicode literals in the source code.

    In the above example utf8::is_utf8($data) will print OK - but don't understand WHY.

    Because of uni::perl which enables use open qw(:utf8 :std);. Any input read from STDIN with <> will be decoded. The normalisation step afterwards does not change that.

提交回复
热议问题