NSString* str = @\"1二3四5\"; NSLog(@\"%c\",[str characterAtIndex:0]); NSLog(@\"%c\",[str characterAtIndex:1]);
NSString - characterAtIndex works we
Unfortunately Dave's answer doesn't actually do what you want. The index supplied to rangeOfComposedCharacterSequenceAtIndex
is an index of a UTF-16 code unit, 1 or 2 or which make a UTF-16 code point. So 1
is not the second UTF-16 code point if the first code point in the string requires 2 code units... (rangeOfComposedCharacterSequenceAtIndex
returns the range of the code point which includes the code unit at the given index, so if your first char requires 2 code units then passing an index of 0 or 1 returns the same range).
If you want to find the UTF-8 sequence for a character you can use UTF8String
and then parse the resultant bytes to find the byte sequence for the nth character. Or you can likewise use rangeOfComposedCharacterSequenceAtIndex
starting at index 0 and iterate till you get to the nth character, then convert the 1 or 2 UTF-16 code units to UTF-8 code units.
I hope we're all missing something and this is built-in...
A start (needs bounds checking!) of a category which might help:
@interface NSString (UTF)
- (NSRange) rangeOfUTFCodePoint:(NSUInteger)number;
@end
@implementation NSString (UTF)
- (NSRange) rangeOfUTFCodePoint:(NSUInteger)number
{
NSUInteger codeUnit = 0;
NSRange result;
for(NSUInteger ix = 0; ix <= number; ix++)
{
result = [self rangeOfComposedCharacterSequenceAtIndex:codeUnit];
codeUnit += result.length;
}
return result;
}
@end
but this sort of stuff is more efficient using char *
rather then NSString