问题
I have a doubt.
st = "b%C3%BCrokommunikation"
urllib2.unquote(st)
OUTPUT: 'b\xc3\xbcrokommunikation' But, if I print it:
print urllib2.unquote(st)
OUTPUT: bürokommunikation
Why is the difference? I have to write bürokommunikation instead of 'b\xc3\xbcrokommunikation' into a file.
My problem is: I have lots of data with such values extracted from URLs. I have to store them as eg. bürokommunikation into a text file.
回答1:
When you print
the string, your terminal emulator recognizes the unicode character \xc3\xbc
and displays it correctly.
However, as @MarkDickinson says in the comments, ü
doesn't exist in ASCII, so you'll need to tell Python that the string you want to write to a file is unicode encoded, and what encoding format you want to use, for instance UTF-8.
This is very easy using the codecs
library:
import codecs
# First create a Python UTF-8 string
st = "b%C3%BCrokommunikation"
encoded_string = urllib2.unquote(st).decode('utf-8')
# Write it to file keeping the encoding
with codecs.open('my_file.txt', 'w', 'utf-8') as f:
f.write(encoded_string)
回答2:
You are looking at the same result. when you try to print it without print command, it just show the __repr__()
result. when you use print, it shows the unicode character instead of escaping it with \x
来源:https://stackoverflow.com/questions/34379432/url-component-and-x