发表新帖

发表新帖

Remove accented characters form string - Python

后端未结

关注

 2  618

I get some data from a webpage and read it like this in python

origional_doc = urllib2.urlopen(url).read()

Sometimes this url has characters su

相关标签:

2条回答

广开言路

2021-01-28 10:41
This should work. It will eliminate all characters that are not ascii.
```
    original_doc = (original_doc.decode('unicode_escape').encode('ascii','ignore'))
```
0 讨论(0)
发布评论:

提交评论
- 加载中...
既然无缘

2021-01-28 10:41
using re you can sub all characters that are in a certain hexadecimal ascii range.
```
>>> re.sub('[\x80-\xFF]','','é and ä and ect')
' and  and ect'
```
You can also do the inverse and sub anything thats NOT in the basic 128 characters:
```
>>> re.sub('[^\x00-\x7F]','','é and ä and ect')
' and  and ect'
```
0 讨论(0)
发布评论:

提交评论
- 加载中...

热议问题