Unicode (UTF-8) reading and writing to files in Python

前端 未结 14 1058
谎友^
谎友^ 2020-11-22 17:10

I\'m having some brain failure in understanding reading and writing text to a file (Python 2.4).

# The string, which has an a-acute in it.
ss = u\'Capit\\xe1         


        
相关标签:
14条回答
  • 2020-11-22 17:35

    Well, your favorite text editor does not realize that \xc3\xa1 are supposed to be character literals, but it interprets them as text. That's why you get the double backslashes in the last line -- it's now a real backslash + xc3, etc. in your file.

    If you want to read and write encoded files in Python, best use the codecs module.

    Pasting text between the terminal and applications is difficult, because you don't know which program will interpret your text using which encoding. You could try the following:

    >>> s = file("f1").read()
    >>> print unicode(s, "Latin-1")
    Capitán
    

    Then paste this string into your editor and make sure that it stores it using Latin-1. Under the assumption that the clipboard does not garble the string, the round trip should work.

    0 讨论(0)
  • 2020-11-22 17:36

    So, I've found a solution for what I'm looking for, which is:

    print open('f2').read().decode('string-escape').decode("utf-8")
    

    There are some unusual codecs that are useful here. This particular reading allows one to take UTF-8 representations from within Python, copy them into an ASCII file, and have them be read in to Unicode. Under the "string-escape" decode, the slashes won't be doubled.

    This allows for the sort of round trip that I was imagining.

    0 讨论(0)
提交回复
热议问题