How can I detect the encoding/codepage of a text file

后端 未结 20 1387
梦如初夏
梦如初夏 2020-11-21 22:42

In our application, we receive text files (.txt, .csv, etc.) from diverse sources. When reading, these files sometimes contain garbage, because the

20条回答
  •  感动是毒
    2020-11-21 23:05

    You can't detect the codepage

    This is clearly false. Every web browser has some kind of universal charset detector to deal with pages which have no indication whatsoever of an encoding. Firefox has one. You can download the code and see how it does it. See some documentation here. Basically, it is a heuristic, but one that works really well.

    Given a reasonable amount of text, it is even possible to detect the language.

    Here's another one I just found using Google:

提交回复
热议问题