python string including double quote character

三世轮回 提交于 2020-01-02 04:38:13

问题


I have input strings that are comprised of characters, including double and single quotes " and '

B@SS$*JU(PQ
AD&^%$^@!$
%()%@@DDSFD"*")(#
ABD*E@(%J^&@

however, when I open the above input from a text file and just print it, the double quotes " in the third line get printed as \xe2\x80\x9d

I am aiming to do a simple character count:

B 2
@ 3
S 2
$ 3
etc.

so I want to be able to output

" 3

in the above list. Should I replace the double quotes with something so I can count them and print out the count?

Thanks a lot.


回答1:


\xe2\x80\x9d

Is a unicode value for "special" double quotes. You could decode from UTF-8 into Unicode to convert this into a "single" Unicode character.

>>> print "\xe2\x80\x9d".decode("utf-8")
”
>>> len("\xe2\x80\x9d".decode("utf-8"))
1

If you are using Python 3:

>>> print(b"\xe2\x80\x9d".decode('utf8'))
”
>>> len(b"\xe2\x80\x9d".decode("utf-8"))
1

So for your file that you are counting (in Python 2):

from collections import defaultdict
with open("filename", 'r') as f:
    for text in f:
        decoded = text.decode("utf-8")
        count = defaultdict(int)
        for i in decoded:
            count[i] += 1


来源:https://stackoverflow.com/questions/24235797/python-string-including-double-quote-character

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!