How do I use filesystem functions in PHP, using UTF-8 strings?

后端 未结 9 2296
[愿得一人]
[愿得一人] 2020-11-22 13:26

I can\'t use mkdir to create folders with UTF-8 characters:


when I

9条回答
  •  逝去的感伤
    2020-11-22 14:07

    Just urlencode the string desired as a filename. All characters returned from urlencode are valid in filenames (NTFS/HFS/UNIX), then you can just urldecode the filenames back to UTF-8 (or whatever encoding they were in).

    Caveats (all apply to the solutions below as well):

    • After url-encoding, the filename must be less that 255 characters (probably bytes).
    • UTF-8 has multiple representations for many characters (using combining characters). If you don't normalize your UTF-8, you may have trouble searching with glob or reopening an individual file.
    • You can't rely on scandir or similar functions for alpha-sorting. You must urldecode the filenames then use a sorting algorithm aware of UTF-8 (and collations).

    Worse Solutions

    The following are less attractive solutions, more complicated and with more caveats.

    On Windows, the PHP filesystem wrapper expects and returns ISO-8859-1 strings for file/directory names. This gives you two choices:

    1. Use UTF-8 freely in your filenames, but understand that non-ASCII characters will appear incorrect outside PHP. A non-ASCII UTF-8 char will be stored as multiple single ISO-8859-1 characters. E.g. ó will be appear as ó in Windows Explorer.

    2. Limit your file/directory names to characters representable in ISO-8859-1. In practice, you'll pass your UTF-8 strings through utf8_decode before using them in filesystem functions, and pass the entries scandir gives you through utf8_encode to get the original filenames in UTF-8.

    Caveats galore!

    • If any byte passed to a filesystem function matches an invalid Windows filesystem character in ISO-8859-1, you're out of luck.
    • Windows may use an encoding other than ISO-8859-1 in non-English locales. I'd guess it will usually be one of ISO-8859-#, but this means you'll need to use mb_convert_encoding instead of utf8_decode.

    This nightmare is why you should probably just transliterate to create filenames.

提交回复
热议问题