hash unicode string in python

后端 未结 3 2091
难免孤独
难免孤独 2021-02-06 21:07

I try to hash some unicode strings:

hashlib.sha1(s).hexdigest()
UnicodeEncodeError: \'ascii\' codec can\'t encode characters in position 0-81: 
ordinal not in ra         


        
相关标签:
3条回答
  • 2021-02-06 21:20

    Apparently hashlib.sha1 isn't expecting a unicode object, but rather a sequence of bytes in a str object. Encoding your unicode string to a sequence of bytes (using, say, the UTF-8 encoding) should fix it:

    >>> import hashlib
    >>> s = u'é'
    >>> hashlib.sha1(s.encode('utf-8'))
    <sha1 HASH object @ 029576A0>
    

    The error is because it is trying to convert the unicode object to a str automatically, using the default ascii encoding, which can't handle all those non-ASCII characters (since your string isn't pure ASCII).

    A good starting point for learning more about Unicode and encodings is the Python docs, and this article by Joel Spolsky.

    0 讨论(0)
  • 2021-02-06 21:33

    Use encoding format utf-8, Try this easy way,

    >>> import hashlib
    >>> hashlib.sha256(str(random.getrandbits(256)).encode('utf-8')).hexdigest()
    'cd183a211ed2434eac4f31b317c573c50e6c24e3a28b82ddcb0bf8bedf387a9f'
    
    0 讨论(0)
  • 2021-02-06 21:33

    You hash bytes, not strings. So you gotta know what bytes you really want to hash, for example an utf8 memory representation of the string or a utf16 memory representation of the string, etc.

    0 讨论(0)
提交回复
热议问题