I have input strings that are comprised of characters, including double and single quotes " and '
B@SS$*JU(PQ
AD&^%$^@!$
%()%@@DDSFD"*")(#
ABD*E@(%J^&@
however, when I open the above input from a text file and just print it, the double quotes " in the third line get printed as \xe2\x80\x9d
I am aiming to do a simple character count:
B 2
@ 3
S 2
$ 3
etc.
so I want to be able to output
" 3
in the above list. Should I replace the double quotes with something so I can count them and print out the count?
Thanks a lot.
\xe2\x80\x9d
Is a unicode value for "special" double quotes. You could decode from UTF-8 into Unicode to convert this into a "single" Unicode character.
>>> print "\xe2\x80\x9d".decode("utf-8")
”
>>> len("\xe2\x80\x9d".decode("utf-8"))
1
If you are using Python 3:
>>> print(b"\xe2\x80\x9d".decode('utf8'))
”
>>> len(b"\xe2\x80\x9d".decode("utf-8"))
1
So for your file that you are counting (in Python 2):
from collections import defaultdict
with open("filename", 'r') as f:
for text in f:
decoded = text.decode("utf-8")
count = defaultdict(int)
for i in decoded:
count[i] += 1
来源:https://stackoverflow.com/questions/24235797/python-string-including-double-quote-character