Regarding reading and writing text files in Python, one of the main Python contributors mentions this regarding the surrogateescape Unicode Error Handler:
For what reason should a low-surrogate DCC3 be encoded in utf-8? This is not allowed and useless because a surrogate is NOT a character. Find the high-surrogate that belongs to the low-surrogate, decode its codepoint and then create the proper utf-8 sequence for the codepoint.