Unicode filenames on FAT-32?

北战南征 提交于 2020-01-01 08:03:27

问题


As far as I understand - NTFS supports Unicode filenames (UTF-16 as Micorsoft claims?).

But official MSDN documentation is very vague regarding what codepage(s) is used to store filenames (filepaths) on FAT-32.

Here it says that OEM code page (CP437 I assume) is used to store filenames: http://msdn.microsoft.com/en-us/library/windows/desktop/dd317748.aspx

But here it turns out that there can be different OEM codepages with CP437 being one of them: http://msdn.microsoft.com/en-us/library/windows/desktop/dd317752.aspx

And we all now that utilities like mount support many more different codepages for FAT, more than just OEM codepages set.

So what is the actual cdepage for FAT-32 filenames? It depends on the system codepage at the time when FAT volume was created? Can FAT support true Double Byte Character Set codepages like UTF-16? Or Multi Byte Character Set codepages like UTF-8 is the limit?

And more specific question: What happens when I use CreateFileW function (which, as MSDN states, use UTF-16 as filename codepage) to create a file on FAT-32 volume?


回答1:


You might have to experiment here. This is a great question, and I'm not 100% confident, but:

So what is the actual codepage for FAT-32 filenames? It depends on the system codepage at the time when FAT volume was created?

The "OEM codepage", whatever that is for the system.

Can FAT support true Double Byte Character Set codepages like UTF-16? Or Multi Byte Character Set codepages like UTF-8 is the limit?

No, I don't believe FAT is directly capable of either UTF-16 or UTF-8. That said, Microsoft stores the Unicode filename in an out of band method. A file thus has two filenames. (This is how you can have longer than 8.3 character filenames, as well.)

And more specific question: What happens when I use CreateFileW function (which, as MSDN states, use UTF-16 as filename codepage) to create a file on FAT-32 volume?

The Unicode filename, as passed to CreateFileW is stored directly in the out of band filename. It is re-encoded into the OEM codepage (whatever that happens to be on the system) and is put there. If it cannot be converted into the OEM codepage, or exceeds 8.3 characters, Windows will call the file something like, FILENA~1.TXT.

Some citations for these answers:

First, this page tells us that the OEM code page != the Windows code page:

Non-Unicode applications that create FAT files sometimes have to use the standard C runtime library conversion functions to translate between the Windows code page character set and the OEM code page character set. With Unicode implementations of the file system functions, it is not necessary to perform such translations.

On a typical American system, the OEM code page is "CP437", but the Windows code page is Windows-1252 (The FooA calls, I believe, use the Windows code page, typically Windows-1252 on an American machine, but depends on locale).

If you have a FAT volume available, you can see this in action. The character "Σ" (U+03a3) is not present in Windows-1252, however, it is in CP437. You can see both the short and long filenames with dir /X. With a file named asdfΣ.txt, you'll see:

ASDFΣ.TXT    asdfΣ.txt

However, with a file named "asdfΛ.txt" (Λ is not present in either CP437 or Windows-1252), you'll see:

ASDF~1.TXT   asdf?.txt

(You'll likely see ?, because cmd.exe's font cannot display a Λ.)

For information about long filenames, see this Wikipedia article.

Also, interestingly, if you name a file "asdf©.txt", you might get:

ASDFC.TXT    asdfc.txt

… I'm not 100% sure here, but I think Windows cleverly decided to substitute "c" for ©, and did likewise for displaying it. If you change the font to something not raster based, like Consolas, you'll see:

ASDFC.TXT    asdf©.txt

And this is why you should use the FooW functions.




回答2:


The basic FAT or FAT32 directory entries support only short names (the old DOS 8.3 format) in the current OEM codepage. However, VFAT (FAT with long filename support) which is used while under Windows, can store an additional, so-called long filename for each file, in UTF-16.



来源:https://stackoverflow.com/questions/19503697/unicode-filenames-on-fat-32

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!