Weird leading characters utf-8/utf-16 encoding in Python

后端 未结 1 747
终归单人心
终归单人心 2021-01-25 16:33

I have written a simplified version to demonstrate the problem. I am encoding special characters in utf-8 and UTF-16 format.

With utf-8 encoding there is no problem, whe

相关标签:
1条回答
  • 2021-01-25 17:17

    Answer to the problem was given by @tripleee.

    By defining utf-16le or utf-16be instead of utf-16 resolved the problem.

    Sample of solution:

    #!/usr/bin/env python2
    # -*- coding: utf-8 -*-
    
    import chardet
    
    
    def myEncode(s, pattern):
        try:
            s.strip()
            u = unicode(s, pattern)
            print chardet.detect(u.encode(pattern, 'strict'))
            return u.encode(pattern, 'strict')
        except UnicodeDecodeError as err:
            return "UnicodeDecodeError: ", err
        except Exception as err:
            return "ExceptionError: ", err
    
    print myEncode(r"""Test !"#$%&'()*+-,./:;<=>?@[\]?_{@}~& € ÄÖÜ äöüß £¥§""",
                   'utf-8')
    print myEncode(r"""Test !"#$%&'()*+-,./:;<=>?@[\]?_{@}~& € ÄÖÜ äöüß £¥§""",
                   'utf-16be')
    

    Sample of output:

    {'confidence': 0.99, 'language': '', 'encoding': 'utf-8'}
    Test !"#$%&'()*+-,./:;<=>?@[\]?_{@}~& € ÄÖÜ äöüß £¥§
    {'confidence': 0.99, 'language': '', 'encoding': 'utf-8'}
    Test !"#$%&'()*+-,./:;<=>?@[\]?_{@}~& € ÄÖÜ äöüß £¥§
    
    0 讨论(0)
提交回复
热议问题