As part of my python project, I need to read a text file encoded in utf-8 and split it into a list. But when I use a letter followed by an apostrophe, list() seems to output utf
Your eyes deceive you.... well, your terminal deceives you, but close enough. I can reconstruct your string and print the apostrophe. But really that string contained utf-8 encoded bytes. Python printed the encoded string and my utf-8 terminal decoded it and displayed the unicode character. this is a quirk of python 2. Python 3 does a better job of keeping encoded strings and decoded strings separate.
>>> chars = ['i', ' ', 'l', 'i', 'k', 'e', ' ', 'p', 'i', '\xe2', '\x80', '\x99']
>>>
>>> s1 = ''.join(chars)
>>> print s1
i like pi’
>>> print repr(s1)
'i like pi\xe2\x80\x99'
Since your file is utf-8 encoded you can use the codecs
module to convert it to unicode.
intext = codecs.open("path/infile.txt", encoding="utf-8").read()