There are a couple of different ways to remove HTML tags
from an NSString
in Cocoa
.
One way is to render the string into an
Here's a more efficient solution than the accepted answer:
- (NSString*)hp_stringByRemovingTags
{
static NSRegularExpression *regex = nil;
static dispatch_once_t onceToken;
dispatch_once(&onceToken, ^{
regex = [NSRegularExpression regularExpressionWithPattern:@"<[^>]+>" options:kNilOptions error:nil];
});
// Use reverse enumerator to delete characters without affecting indexes
NSArray *matches =[regex matchesInString:self options:kNilOptions range:NSMakeRange(0, self.length)];
NSEnumerator *enumerator = matches.reverseObjectEnumerator;
NSTextCheckingResult *match = nil;
NSMutableString *modifiedString = self.mutableCopy;
while ((match = [enumerator nextObject]))
{
[modifiedString deleteCharactersInRange:match.range];
}
return modifiedString;
}
The above NSString
category uses a regular expression to find all the matching tags, makes a copy of the original string and finally removes all the tags in place by iterating over them in reverse order. It's more efficient because:
This performed well enough for me but a solution using NSScanner
might be more efficient.
Like the accepted answer, this solution doesn't address all the border cases requested by @lfalin. Those would be require much more expensive parsing which the average use case most likely doesn't need.