Remove all characters from a string who's ordinals are out of range

前端未结

关注

 3  1522

眼角桃花

What is a good way to remove all characters that are out of the range: ordinal(128) from a string in python?

I\'m using hashlib.sha256 in python 2.7. I\

相关标签:

3条回答

滥情空心

2021-01-14 08:17
Instead of removing those characters, it would be better to use an encoding that hashlib won't choke on, utf-8 for example:
```
>>> data = u'\u200e'
>>> hashlib.sha256(data.encode('utf-8')).hexdigest()
'e76d0bc0e98b2ad56c38eebda51da277a591043c9bc3f5c5e42cd167abc7393e'
```
0 讨论(0)
发布评论:

提交评论
- 加载中...
栀梦

2021-01-14 08:30
```
new_safe_str = some_string.encode('ascii','ignore') 
```
I think would work

or you could do a list comprehension
```
"".join([ch for ch in orig_string if ord(ch)<= 128])
```
[edit] however as others have said it may be better to figure out how to deal with unicode in general... unless you really need it encoded as ascii for some reason
0 讨论(0)
发布评论:

提交评论
- 加载中...

春和景丽

2021-01-14 08:30

This is an example of where the changes in python3 will make an improvement, or at least generate a clearer error message

Python2

>>> import hashlib
>>> funky_string=u"You owe me £100"
>>> hashlib.sha256(funky_string)
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
UnicodeEncodeError: 'ascii' codec can't encode character u'\xa3' in position 11: ordinal not in range(128)
>>> hashlib.sha256(funky_string.encode("utf-8")).hexdigest()
'81ebd729153b49aea50f4f510972441b350a802fea19d67da4792b025ab6e68e'
>>>

Python3

>>> import hashlib
>>> funky_string="You owe me £100"
>>> hashlib.sha256(funky_string)
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
TypeError: Unicode-objects must be encoded before hashing
>>> hashlib.sha256(funky_string.encode("utf-8")).hexdigest()
'81ebd729153b49aea50f4f510972441b350a802fea19d67da4792b025ab6e68e'
>>>

The real problem is that sha256 takes a sequence of bytes which python2 doesn't have a clear concept of. Use .encode("utf-8") is what I'd suggest.

0 讨论(0)