We hav some text containing german umlauts represented using e.g. \'a\' + COMBINING DIAERESIS ($cc $88).
Any idea how to convert such text properly to utf8?
First, if it's not already a unicode then decode it. Second, unicodedata.normalize(). Third, encode.
unicode