My app downloads a file in UTF-8 format, which is too large to read using the NSString initWithContentsOfFile
method. The problem I have is that the NSFileHan
It's actually really easy to tell if you have split a multibyte character in UTF-8. Continuation characters all have the two most significant bits set like this: 10xxxxxx
. So if the last octet of the buffer has that pattern, scan backwards to find an octet that does not have that form. This is the first octet of the character. The position of the most significant 0
in the octet tells you how many octets are in the character
0xxxxxxx => 1 octet (ASCII)
110xxxxx => 2 octets
1110xxxx => 3 octets
and so on up to 6 octets.
So it's fairly trivial to figure out how many extra octets to read to get to a character boundary.