decode-encode UTF-8 doesn't lead to the original unicode

痴心易碎 提交于 2019-12-06 06:32:32

Your version of python is using UCS-2 (16 bits per character) but these particular unicode characters require 32 bits, so element of u represents "half" of a character. u.encode('utf-8') works properly because it understanding the encoding.

Your utf-8 string encodes these two characters:

U+1F4F1 MOBILE PHONE character (📱)

U+1F6AC SMOKING SYMBOL character (🚬)

(via this decoder: http://software.hixie.ch/utilities/cgi/unicode-decoder/utf8-decoder)

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!