Different utf8 encoding in filenames os x

后端 未结 1 1781
孤城傲影
孤城傲影 2020-12-09 18:21

I have a small shellscript in .x

$ cat .x
u=\"Böhmáí\"
touch \"$u\"
ls > .list
echo \"$u\" >.text

cat .list .text
diff .list .text
od -bc         


        
相关标签:
1条回答
  • 2020-12-09 18:51

    (This is mostly stolen from a previous answer of mine...)

    Unicode allows some accented characters to be represented in several different ways: as a "code point" representing the accented character, or as a series of code points representing the unaccented version of the character, followed by the accent(s). For example, "ä" could be represented either precomposed as U+00E4 (UTF-8 0xc3a4, Latin small letter 1 with diaeresis) or decomposed as U+0061 U+0308 (UTF-8 0x61cc88, Latin small letter a + combining diaeresis).

    OS X's HFS+ filesystem requires that all filenames be stored in the UTF-8 representation of their fully decomposed form. In an HFS+ filename, "ä" MUST be encoded as 0x61cc88, and "ö" MUST be encoded as 0x6fcc88.

    So what's happening here is that your shell script contains "Böhmáí" in precomposed form, so it gets stored that way in the variable a, and stored that way in the .text file. But when you create a file with that name (with touch), the filesystem converts it to the decomposed form for the actual filename. And when you ls it, it shows the form the filesystem has: the decomposed form.

    0 讨论(0)
提交回复
热议问题