LoadFromFile with Unicode data

后端 未结 2 470
攒了一身酷
攒了一身酷 2020-12-31 18:24

My input file(f) has some Unicode (Swedish) that isn\'t being read correctly.

Neither of these approaches works, although they give different results:



        
相关标签:
2条回答
  • 2020-12-31 18:45

    I assume that you mean 'UTF-8' when you say 'Unicode'.

    If you know that the file is UTF-8, then do

    LoadFromFile(f, TEncoding.UTF8).
    

    To save:

    SaveToFile(f, TEncoding.UTF8);
    

    (The GetOEMCP WinAPI function is for old 255-character character sets.)

    0 讨论(0)
  • 2020-12-31 19:05

    In order to load a Unicode text file you need to know its encoding. If the file has a Byte Order Mark (BOM), then you can simply call LoadFromFile(FileName) and the RTL will use the BOM to determine the encoding.

    If the file does not have a BOM then you need to explicitly specify the encoding, e.g.

    LoadFromFile(FileName, TEncoding.UTF8);
    LoadFromFile(FileName, TEncoding.Unicode);//UTF-16 LE
    LoadFromFile(FileName, TEncoding.BigEndianUnicode);//UTF-16 BE
    

    For some reason, unknown to me, there is no built in support for UTF-32, but if you had such a file then it would be easy enough to add a TEncoding instance to handle that.

    0 讨论(0)
提交回复
热议问题