How to match a emoticon in sentence with regular expressions

a 夏天 提交于 2019-12-09 07:42:28

Try

string1_gbk = string1.decode('utf-8').encode('gb2312', 'replace')

Should output ? instead of those emoticons.

Python Docs - Python Wiki

Aleksei astynax Pirogov

'\ue317' is not a substring of u"asdasd \ue317 asad" - it's human-readable unicode character representation, and can not be matched by regexp. regexp works with repr(u'\ue317')

It may be because the backslash is a special escape character in regexp syntax. The following worked for me:

>>> test_str = 'blah blah blah \ue317 blah blah \ueaa2 blah ue317'
>>> re.findall(r'\\ue[0-9A-Za-z]{3}', test_str)
['\\ue317', '\\ueaa2']

Notice it doesn't erroneously match the ue317 at the end, which has no preceding backslash. Obviously, use re.sub() if you wish to replace those character strings.

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!