Remove all characters from a string who's ordinals are out of range

纵饮孤独 提交于 2019-12-01 08:14:37
new_safe_str = some_string.encode('ascii','ignore') 

I think would work

or you could do a list comprehension

"".join([ch for ch in orig_string if ord(ch)<= 128])

[edit] however as others have said it may be better to figure out how to deal with unicode in general... unless you really need it encoded as ascii for some reason

Instead of removing those characters, it would be better to use an encoding that hashlib won't choke on, utf-8 for example:

>>> data = u'\u200e'
>>> hashlib.sha256(data.encode('utf-8')).hexdigest()
'e76d0bc0e98b2ad56c38eebda51da277a591043c9bc3f5c5e42cd167abc7393e'

This is an example of where the changes in python3 will make an improvement, or at least generate a clearer error message

Python2

>>> import hashlib
>>> funky_string=u"You owe me £100"
>>> hashlib.sha256(funky_string)
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
UnicodeEncodeError: 'ascii' codec can't encode character u'\xa3' in position 11: ordinal not in range(128)
>>> hashlib.sha256(funky_string.encode("utf-8")).hexdigest()
'81ebd729153b49aea50f4f510972441b350a802fea19d67da4792b025ab6e68e'
>>> 

Python3

>>> import hashlib
>>> funky_string="You owe me £100"
>>> hashlib.sha256(funky_string)
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
TypeError: Unicode-objects must be encoded before hashing
>>> hashlib.sha256(funky_string.encode("utf-8")).hexdigest()
'81ebd729153b49aea50f4f510972441b350a802fea19d67da4792b025ab6e68e'
>>> 

The real problem is that sha256 takes a sequence of bytes which python2 doesn't have a clear concept of. Use .encode("utf-8") is what I'd suggest.

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!