Because MySQL 5.1 does not support 4 byte UTF-8 sequences, I need to replace/drop the 4 byte sequences in these strings.
I\'m looking a clean way to replace these charac
5 byte utf-8 sequences begin with a 111110xx-byte and 6 byte utf-8 sequences begin with a 1111110x-byte. Important to note is, that no follow-up bytes of 1-4-byte utf-8 sequences contain bytes that large because follow-up bytes are always of the form 10xxxxxx.
Therefore you can just go through the bytes and every time you see a byte of kind 111110xx then only emit a '?' to the output-stream/array while skipping the next 4 bytes from the input; analogue for the 6-byte-sequences.