How can I detect the encoding/codepage of a text file

后端未结

关注

 20  1392

梦如初夏 2020-11-21 22:42

In our application, we receive text files (.txt, .csv, etc.) from diverse sources. When reading, these files sometimes contain garbage, because the

20条回答

长情又很酷 (楼主)

2020-11-21 23:30

The tool "uchardet" does this well using character frequency distribution models for each charset. Larger files and more "typical" files have more confidence (obviously).

On ubuntu, you just apt-get install uchardet.

On other systems, get the source, usage & docs here: https://github.com/BYVoid/uchardet

0 讨论(0)

查看其它20个回答
发布评论:

提交评论
- 加载中...