python list() function changing into (I think) utf-8

后端 未结 1 554
时光取名叫无心
时光取名叫无心 2021-01-28 21:31

As part of my python project, I need to read a text file encoded in utf-8 and split it into a list. But when I use a letter followed by an apostrophe, list() seems to output utf

1条回答
  •  后悔当初
    2021-01-28 21:40

    Your eyes deceive you.... well, your terminal deceives you, but close enough. I can reconstruct your string and print the apostrophe. But really that string contained utf-8 encoded bytes. Python printed the encoded string and my utf-8 terminal decoded it and displayed the unicode character. this is a quirk of python 2. Python 3 does a better job of keeping encoded strings and decoded strings separate.

    >>> chars = ['i', ' ', 'l', 'i', 'k', 'e', ' ', 'p', 'i', '\xe2', '\x80', '\x99']
    >>> 
    >>> s1 = ''.join(chars)
    >>> print s1
    i like pi’
    >>> print repr(s1)
    'i like pi\xe2\x80\x99'
    

    Since your file is utf-8 encoded you can use the codecs module to convert it to unicode.

    intext = codecs.open("path/infile.txt", encoding="utf-8").read() 
    

    0 讨论(0)
提交回复
热议问题