How can I detect the encoding/codepage of a text file

后端未结

关注

 20  1387

梦如初夏 2020-11-21 22:42

In our application, we receive text files (.txt, .csv, etc.) from diverse sources. When reading, these files sometimes contain garbage, because the

20条回答

感动是毒 (楼主)

2020-11-21 23:05

You can't detect the codepage

This is clearly false. Every web browser has some kind of universal charset detector to deal with pages which have no indication whatsoever of an encoding. Firefox has one. You can download the code and see how it does it. See some documentation here. Basically, it is a heuristic, but one that works really well.

Given a reasonable amount of text, it is even possible to detect the language.

Here's another one I just found using Google:

0 讨论(0)

查看其它20个回答
发布评论:

提交评论
- 加载中...