I received some text that is encoded, but I don\'t know what charset was used. Is there a way to determine the encoding of a text file using Python? How can I detect the enc
Here is an example of reading and taking at face value a chardet encoding prediction, reading n_lines from the file in the event it is large.
chardet also gives you a probability (i.e. confidence) of it's encoding prediction (haven't looked how they come up with that), which is returned with its prediction from chardet.predict(), so you could work that in somehow if you like.
def predict_encoding(file_path, n_lines=20):
'''Predict a file's encoding using chardet'''
import chardet
# Open the file as binary data
with open(file_path, 'rb') as f:
# Join binary lines for specified number of lines
rawdata = b''.join([f.readline() for _ in range(n_lines)])
return chardet.detect(rawdata)['encoding']