Before anyone will tells me to RTFM, I must say - I have digged through:
HOW i can ensure (test it), than any $other_data contains valid unicode string?
You cannot determine ex post facto whether a string has character semantics or byte semantics. Perl does not track this for you. You have to track it by careful programming: encode and decode at the boundaries; :raw
layer for byte semantics, :encoding(foo)
for character semantics. Employ naming conventions for your variables and functions to clearly differentiate between the semantics and make wrong code look wrong.
for what purpose is the utf8::is_utf8($data)?
It tells you the presence of the SvUTF8
flag, nothing more. This is almost entirely useless for most developers, because it is an internals thing. The flag does not mean that a string has character semantics, its absence does not mean that a string has byte semantics.
The whole utf8 pragma is a mystery for me.
Probably because it is overdocumented, and therefore confusing. Most developers can stop reading after the part where is says that its purpose is to enable Unicode literals in the source code.
In the above example utf8::is_utf8($data) will print OK - but don't understand WHY.
Because of uni::perl which enables use open qw(:utf8 :std);
. Any input read from STDIN with <>
will be decoded. The normalisation step afterwards does not change that.