How can I detect the encoding/codepage of a text file

后端 未结 20 1392
梦如初夏
梦如初夏 2020-11-21 22:42

In our application, we receive text files (.txt, .csv, etc.) from diverse sources. When reading, these files sometimes contain garbage, because the

20条回答
  •  长情又很酷
    2020-11-21 23:30

    The tool "uchardet" does this well using character frequency distribution models for each charset. Larger files and more "typical" files have more confidence (obviously).

    On ubuntu, you just apt-get install uchardet.

    On other systems, get the source, usage & docs here: https://github.com/BYVoid/uchardet

提交回复
热议问题