Creating filenames with unicode characters

前端 未结 1 1093
别那么骄傲
别那么骄傲 2021-01-18 23:01

I am looking for some guidelines for how to create filenames with Unicode characters. Consider:

use open qw( :std :utf8 );
use strict;
use utf8;
use warnings         


        
相关标签:
1条回答
  • 2021-01-18 23:04

    Because of a bug called "The Unicode Bug". The equivalent of the following is happening:

    use Encode qw( encode_utf8 is_utf8 );
    
    my $bytes = is_utf8($str) ? encode_utf8($str) : $str;
    

    is_utf8 checks which of two string storage format is used by the scalar. This is an internal implementation detail you should never have to worry about, except for The Unicode Bug.

    Your program works because encode always returns a string for which is_utf8 returns false, and use utf8; always returns a string for which is_utf8 returns true if the string contains non-ASCII characters.

    If you don't encode as you should, you will sometimes get the wrong result. For example, if you had used "\x{E6}2" instead of 'æ2', you would have gotten a different file name even though the strings have the same length and the same characters.

    $ dir
    total 0
    
    $ perl -wE'
       use utf8;
       $fu="æ";
       $fd="\x{E6}";
       say sprintf "%vX", $_ for $fu, $fd;
       say $fu eq $fd ? "eq" : "ne";
       system("touch", $_) for "u".$fu, "d".$fd
    '
    E6
    E6
    eq
    
    $ dir
    total 0
    -rw------- 1 ikegami ikegami 0 Jul 12 12:18 uæ
    -rw------- 1 ikegami ikegami 0 Jul 12 12:18 d?
    
    0 讨论(0)
提交回复
热议问题