How to determine the encoding of text?

前端未结

关注

 10  1481

一向 2020-11-21 07:47

I received some text that is encoded, but I don\'t know what charset was used. Is there a way to determine the encoding of a text file using Python? How can I detect the enc

10条回答

遇见更好的自我 (楼主)

2020-11-21 08:21
Here is an example of reading and taking at face value a chardet encoding prediction, reading n_lines from the file in the event it is large.

chardet also gives you a probability (i.e. confidence) of it's encoding prediction (haven't looked how they come up with that), which is returned with its prediction from chardet.predict(), so you could work that in somehow if you like.
```
def predict_encoding(file_path, n_lines=20):
    '''Predict a file's encoding using chardet'''
    import chardet

    # Open the file as binary data
    with open(file_path, 'rb') as f:
        # Join binary lines for specified number of lines
        rawdata = b''.join([f.readline() for _ in range(n_lines)])

    return chardet.detect(rawdata)['encoding']
```
0 讨论(0)

查看其它10个回答
发布评论:

提交评论
- 加载中...