I\'m having some brain failure in understanding reading and writing text to a file (Python 2.4).
# The string, which has an a-acute in it.
ss = u\'Capit\\xe1
Well, your favorite text editor does not realize that \xc3\xa1
are supposed to be character literals, but it interprets them as text. That's why you get the double backslashes in the last line -- it's now a real backslash + xc3
, etc. in your file.
If you want to read and write encoded files in Python, best use the codecs module.
Pasting text between the terminal and applications is difficult, because you don't know which program will interpret your text using which encoding. You could try the following:
>>> s = file("f1").read()
>>> print unicode(s, "Latin-1")
Capitán
Then paste this string into your editor and make sure that it stores it using Latin-1. Under the assumption that the clipboard does not garble the string, the round trip should work.
So, I've found a solution for what I'm looking for, which is:
print open('f2').read().decode('string-escape').decode("utf-8")
There are some unusual codecs that are useful here. This particular reading allows one to take UTF-8 representations from within Python, copy them into an ASCII file, and have them be read in to Unicode. Under the "string-escape" decode, the slashes won't be doubled.
This allows for the sort of round trip that I was imagining.