I am looking for some guidelines for how to create filenames with Unicode characters. Consider:
use open qw( :std :utf8 );
use strict;
use utf8;
use warnings
Because of a bug called "The Unicode Bug". The equivalent of the following is happening:
use Encode qw( encode_utf8 is_utf8 );
my $bytes = is_utf8($str) ? encode_utf8($str) : $str;
is_utf8
checks which of two string storage format is used by the scalar. This is an internal implementation detail you should never have to worry about, except for The Unicode Bug.
Your program works because encode
always returns a string for which is_utf8
returns false, and use utf8;
always returns a string for which is_utf8
returns true if the string contains non-ASCII characters.
If you don't encode
as you should, you will sometimes get the wrong result. For example, if you had used "\x{E6}2"
instead of 'æ2'
, you would have gotten a different file name even though the strings have the same length and the same characters.
$ dir
total 0
$ perl -wE'
use utf8;
$fu="æ";
$fd="\x{E6}";
say sprintf "%vX", $_ for $fu, $fd;
say $fu eq $fd ? "eq" : "ne";
system("touch", $_) for "u".$fu, "d".$fd
'
E6
E6
eq
$ dir
total 0
-rw------- 1 ikegami ikegami 0 Jul 12 12:18 uæ
-rw------- 1 ikegami ikegami 0 Jul 12 12:18 d?