python string including double quote character

断了今生、忘了曾经 提交于 2019-12-05 08:41:23

\xe2\x80\x9d

Is a unicode value for "special" double quotes. You could decode from UTF-8 into Unicode to convert this into a "single" Unicode character.

>>> print "\xe2\x80\x9d".decode("utf-8")
”
>>> len("\xe2\x80\x9d".decode("utf-8"))
1

If you are using Python 3:

>>> print(b"\xe2\x80\x9d".decode('utf8'))
”
>>> len(b"\xe2\x80\x9d".decode("utf-8"))
1

So for your file that you are counting (in Python 2):

from collections import defaultdict
with open("filename", 'r') as f:
    for text in f:
        decoded = text.decode("utf-8")
        count = defaultdict(int)
        for i in decoded:
            count[i] += 1
易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!