How to determine the encoding of text?

前端 未结 10 1481
一向
一向 2020-11-21 07:47

I received some text that is encoded, but I don\'t know what charset was used. Is there a way to determine the encoding of a text file using Python? How can I detect the enc

10条回答
  •  遇见更好的自我
    2020-11-21 08:21

    Here is an example of reading and taking at face value a chardet encoding prediction, reading n_lines from the file in the event it is large.

    chardet also gives you a probability (i.e. confidence) of it's encoding prediction (haven't looked how they come up with that), which is returned with its prediction from chardet.predict(), so you could work that in somehow if you like.

    def predict_encoding(file_path, n_lines=20):
        '''Predict a file's encoding using chardet'''
        import chardet
    
        # Open the file as binary data
        with open(file_path, 'rb') as f:
            # Join binary lines for specified number of lines
            rawdata = b''.join([f.readline() for _ in range(n_lines)])
    
        return chardet.detect(rawdata)['encoding']
    

提交回复
热议问题