Python - dealing with mixed-encoding files
I have a file which is mostly UTF-8, but some Windows-1252 characters have also found their way in. I created a table to map from the Windows-1252 (cp1252) characters to their Unicode counterparts, and would like to use it to fix the mis-encoded characters, e.g. cp1252_to_unicode = { "\x85": u'\u2026', # … "\x91": u'\u2018', # ‘ "\x92": u'\u2019', # ’ "\x93": u'\u201c', # “ "\x94": u'\u201d', # ” "\x97": u'\u2014' # — } for l in open('file.txt'): for c, u in cp1252_to_unicode.items(): l = l.replace(c, u) But attempting to do the replace this way results in a UnicodeDecodeError being raised, e