发表新帖

发表新帖

Open a file in the proper encoding automatically [duplicate]

前端未结

关注

 4  878

相关标签:

4条回答

无人及你

2021-01-02 17:17
If it will be fixed in Python 3, it should also be fixed by using
```
from __future__ import unicode_literals
```
0 讨论(0)
发布评论:

提交评论
- 加载中...
谎友^

2021-01-02 17:22

chardet can help you.

Character encoding auto-detection in Python 2 and 3. As smart as your browser. Open source.

0 讨论(0)
发布评论:

提交评论
- 加载中...
[愿得一人]

2021-01-02 17:23

It won't be "fixed" in python 3, as it's not a fixable problem. Many documents are valid in several encodings, so the only way to determine the proper encoding is to know something about the document. Fortunately, in most cases we do know something about the document, like for instance, most characters will come clustered into distinct unicode blocks. A document in english will mostly contain characters within the first 128 codepoints. A document in russian will contain mostly cyrillic codepoints. Most document will contain spaces and newlines. These clues can be used to help you make educated guesses about what encodings are being used. Better yet, use a library written by someone who's already done the work. (Like chardet, mentioned in another answer by Desintegr.

0 讨论(0)
发布评论:

提交评论
- 加载中...
说谎

2021-01-02 17:36

csv.reader cannot handle Unicode strings in 2.x. See the bottom of the csv documentation and this question for ways to handle it.

0 讨论(0)
发布评论:

提交评论
- 加载中...

热议问题