How can I determine which encoding the file uses before I read the file?

前端 未结 3 775
余生分开走
余生分开走 2021-01-21 09:07

I\'m facing a problem.

A file can be written in some encoding such as UTF-8, UTF-16, UTF-32, etc.

When I read a UTF-

相关标签:
3条回答
  • 2021-01-21 09:38

    There is no good way to do that. The question you're asking is like determining the radix of a number by looking at it. For example, what is the radix of 101?

    Best solution would be to read the data into a byte array. Then you can use String(byte[] bytes, Charset charset) to test it with multiple encodings, most likely to least likely.

    0 讨论(0)
  • 2021-01-21 09:52

    You cannot. Which transformation format applies is usually determined by the first four bytes of the file (assuming a BOM). You cannot see those just from the outside.

    0 讨论(0)
  • 2021-01-21 09:54

    You can read the first few bytes and try to guess the encoding.

    If all else fails, try reading with different encodings until one works (no exception when decoding and it 'looks' OK).

    0 讨论(0)
提交回复
热议问题