发表新帖

发表新帖

Java : How to determine the correct charset encoding of a stream

前端未结

关注

 15  1652

花落未央 2020-11-22 02:06

With reference to the following thread: Java App : Unable to read iso-8859-1 encoded file correctly

What is the best way to programatically determine the correct cha

15条回答

借酒劲吻你 (楼主)

2020-11-22 02:24
For ISO8859_1 files, there is not an easy way to distinguish them from ASCII. For Unicode files however one can generally detect this based on the first few bytes of the file.

UTF-8 and UTF-16 files include a Byte Order Mark (BOM) at the very beginning of the file. The BOM is a zero-width non-breaking space.

Unfortunately, for historical reasons, Java does not detect this automatically. Programs like Notepad will check the BOM and use the appropriate encoding. Using unix or Cygwin, you can check the BOM with the file command. For example:
```
$ file sample2.sql 
sample2.sql: Unicode text, UTF-16, big-endian
```
For Java, I suggest you check out this code, which will detect the common file formats and select the correct encoding: How to read a file and automatically specify the correct encoding
0 讨论(0)

查看其它15个回答
发布评论:

提交评论
- 加载中...

热议问题