How to determine the encoding of text?

前端未结

关注

 10  1483

一向 2020-11-21 07:47

I received some text that is encoded, but I don\'t know what charset was used. Is there a way to determine the encoding of a text file using Python? How can I detect the enc

10条回答

情歌与酒 (楼主)

2020-11-21 08:01
Another option for working out the encoding is to use libmagic (which is the code behind the file command). There are a profusion of python bindings available.

The python bindings that live in the file source tree are available as the python-magic (or python3-magic) debian package. It can determine the encoding of a file by doing:
```
import magic

blob = open('unknown-file', 'rb').read()
m = magic.open(magic.MAGIC_MIME_ENCODING)
m.load()
encoding = m.buffer(blob)  # "utf-8" "us-ascii" etc
```
There is an identically named, but incompatible, python-magic pip package on pypi that also uses libmagic. It can also get the encoding, by doing:
```
import magic

blob = open('unknown-file', 'rb').read()
m = magic.Magic(mime_encoding=True)
encoding = m.from_buffer(blob)
```
0 讨论(0)

查看其它10个回答
发布评论:

提交评论
- 加载中...