I need NSRange objects for the position of each uppercase letter in a given NSString for input into a method for a custom attributed string class.
There are of course q
The simplest way is probably to use -rangeOfCharacterFromSet:options:range:
with [NSCharacterSet uppercaseLetterCharacterSet]
. By modifying the range to search over with each call, you can find all of the uppercase letters pretty easily. Something like the following will work to give you an NSArray of all ranges (encoded as NSValues):
- (NSArray *)rangesOfUppercaseLettersInString:(NSString *)str {
NSCharacterSet *cs = [NSCharacterSet uppercaseLetterCharacterSet];
NSMutableArray *results = [NSMutableArray array];
NSRange searchRange = NSMakeRange(0, [str length]);
NSRange range;
while ((range = [str rangeOfCharacterFromSet:cs options:0 range:searchRange]).location != NSNotFound) {
[results addObject:[NSValue valueWithRange:range]];
searchRange = NSMakeRange(NSMaxRange(range), [str length] - NSMaxRange(range));
}
return results;
}
Note, this will not coalesce adjacent ranges into a single range, but that's easy enough to add.
Here's an alternative solution based on NSScanner:
- (NSArray *)rangesOfUppercaseLettersInString:(NSString *)str {
NSCharacterSet *cs = [NSCharacterSet uppercaseLetterCharacterSet];
NSMutableArray *results = [NSMutableArray array];
NSScanner *scanner = [NSScanner scannerWithString:str];
while (![scanner isAtEnd]) {
[scanner scanUpToCharactersFromSet:cs intoString:NULL]; // skip non-uppercase characters
NSString *temp;
NSUInteger location = [scanner scanLocation];
if ([scanner scanCharactersFromSet:cs intoString:&temp]) {
// found one (or more) uppercase characters
NSRange range = NSMakeRange(location, [temp length]);
[results addObject:[NSValue valueWithRange:range]];
}
}
return results;
}
Unlike the last, this one does coalesce adjacent uppercase characters into a single range.
Edit: If you're looking for absolute speed, this one will likely be the fastest of the 3 presented here, while still preserving correct unicode support (note, I have not tried compiling this):
// returns a pointer to an array of NSRanges, and fills in count with the number of ranges
// the buffer is autoreleased
- (NSRange *)rangesOfUppercaseLettersInString:(NSString *)string count:(NSUInteger *)count {
NSMutableData *data = [NSMutableData data];
NSUInteger numRanges = 0;
NSUInteger length = [string length];
unichar *buffer = malloc(sizeof(unichar) * length);
[string getCharacters:buffer range:NSMakeRange(0, length)];
NSCharacterSet *cs = [NSCharacterSet uppercaseLetterCharacterSet];
NSRange range = {NSNotFound, 0};
for (NSUInteger i = 0; i < length; i++) {
if ([cs characterIsMember:buffer[i]]) {
if (range.location == NSNotFound) {
range = (NSRange){i, 0};
}
range.length++;
} else if (range.location != NSNotFound) {
[data appendBytes:&range length:sizeof(range)];
numRanges++;
range = (NSRange){NSNotFound, 0};
}
}
if (range.location != NSNotFound) {
[data appendBytes:&range length:sizeof(range)];
numRanges++;
}
if (count) *count = numRanges;
return [data bytes];
}
Using RegexKitLite 4.0+ with a runtime that supports Blocks, this can be quite zippy:
NSString *string = @"A simple String to TEST for Upper Case Letters.";
NSString *regex = @"\\p{Lu}";
[string enumerateStringsMatchedByRegex:regex options:RKLNoOptions inRange:NSMakeRange(0UL, [string length]) error:NULL enumerationOptions:RKLRegexEnumerationCapturedStringsNotRequired usingBlock:^(NSInteger captureCount, NSString * const capturedStrings[captureCount], const NSRange capturedRanges[captureCount], volatile BOOL * const stop) {
NSLog(@"Range: %@", NSStringFromRange(capturedRanges[0]));
}];
The regex \p{Lu}
says "Match all characters with the Unicode property of 'Letter' that are also 'Upper Case'".
The option RKLRegexEnumerationCapturedStringsNotRequired
tells RegexKitLite that it shouldn't create NSString
objects and pass them via capturedStrings[]
. This saves quite a bit of time and memory. The only thing that gets passed to the block is the NSRange
values for the match via capturedRanges[]
.
There are two main parts to this, the first is the RegexKitLite method:
[string enumerateStringsMatchedByRegex:regex
options:RKLNoOptions
inRange:NSMakeRange(0UL, [string length])
error:NULL
enumerationOptions:RKLRegexEnumerationCapturedStringsNotRequired
usingBlock:/* ... */
];
... and the second is the Block that is passed as an argument to that method:
^(NSInteger captureCount,
NSString * const capturedStrings[captureCount],
const NSRange capturedRanges[captureCount],
volatile BOOL * const stop) { /* ... */ }
a function such as isupper
* in conjunction with -[NSString characterAtIndex:]
will be plenty fast.
*isupper is an example - it may or may not be appropriate for your input.
It somewhat depends on the size of the string, but the absolute fastest way I can think of (note: internationalization safety not guaranteed, or even expected! Does the concept of uppercase even apply in say, Japanese?) is:
1) Get a pointer to a raw C string of the string, preferably in a stack buffer if it's small enough. CFString has functions for this. Read the comments in CFString.h.
2) malloc() a buffer big enough to hold one NSRange per character in the string.
3) Something like this (completely untested, written into this text field, pardon mistakes and typos)
NSRange *bufferCursor = rangeBuffer;
NSRange range = {NSNotFound, 0};
for (int idx = 0; idx < numBytes; ++idx) {
if (isupper(buffer[idx])) {
if (range.length > 0) { //extend a range, we found more than one uppercase letter in a row
range.length++;
} else { //begin a range
range.location = idx;
range.length = 1;
}
}
else if (range.location != NSNotFound) { //end a range, we hit a lowercase letter
*bufferCursor = range;
bufferCursor++;
range.location = NSNotFound;
}
}
4) realloc() the range buffer back down to the size you actually used (might need to keep a count of ranges begun to do that)