There are a couple of different ways to remove HTML tags
from an NSString
in Cocoa
.
One way is to render the string into an
#import "RegexKitLite.h"
string text = [html stringByReplacingOccurrencesOfRegex:@"<[^>]+>" withString:@""]
I've extended the answer by m.kocikowski and tried to make it a bit more efficient by using an NSMutableString. I've also structured it for use in a static Utils class (I know a Category is probably the best design though), and removed the autorelease so it compiles in an ARC project.
Included here in case anybody finds it useful.
.h
+ (NSString *)stringByStrippingHTML:(NSString *)inputString;
.m
+ (NSString *)stringByStrippingHTML:(NSString *)inputString
{
NSMutableString *outString;
if (inputString)
{
outString = [[NSMutableString alloc] initWithString:inputString];
if ([inputString length] > 0)
{
NSRange r;
while ((r = [outString rangeOfString:@"<[^>]+>" options:NSRegularExpressionSearch]).location != NSNotFound)
{
[outString deleteCharactersInRange:r];
}
}
}
return outString;
}
Another one way:
Interface:
-(NSString *) stringByStrippingHTML:(NSString*)inputString;
Implementation
(NSString *) stringByStrippingHTML:(NSString*)inputString
{
NSAttributedString *attrString = [[NSAttributedString alloc] initWithData:[inputString dataUsingEncoding:NSUTF8StringEncoding] options:@{NSDocumentTypeDocumentAttribute: NSHTMLTextDocumentType,NSCharacterEncodingDocumentAttribute: @(NSUTF8StringEncoding)} documentAttributes:nil error:nil];
NSString *str= [attrString string];
//you can add here replacements as your needs:
[str stringByReplacingOccurrencesOfString:@"[" withString:@""];
[str stringByReplacingOccurrencesOfString:@"]" withString:@""];
[str stringByReplacingOccurrencesOfString:@"\n" withString:@""];
return str;
}
Realization
cell.exampleClass.text = [self stringByStrippingHTML:[exampleJSONParsingArray valueForKey: @"key"]];
or simple
NSString *myClearStr = [self stringByStrippingHTML:rudeStr];
use this
NSString *myregex = @"<[^>]*>"; //regex to remove any html tag
NSString *htmlString = @"<html>bla bla</html>";
NSString *stringWithoutHTML = [hstmString stringByReplacingOccurrencesOfRegex:myregex withString:@""];
don't forget to include this in your code : #import "RegexKitLite.h" here is the link to download this API : http://regexkit.sourceforge.net/#Downloads
Extending this more from m.kocikowski's and Dan J's answers with more explanation for newbies
1# First you have to create objective-c-categories to make the code useable in any class.
.h
@interface NSString (NAME_OF_CATEGORY)
- (NSString *)stringByStrippingHTML;
@end
.m
@implementation NSString (NAME_OF_CATEGORY)
- (NSString *)stringByStrippingHTML
{
NSMutableString *outString;
NSString *inputString = self;
if (inputString)
{
outString = [[NSMutableString alloc] initWithString:inputString];
if ([inputString length] > 0)
{
NSRange r;
while ((r = [outString rangeOfString:@"<[^>]+>" options:NSRegularExpressionSearch]).location != NSNotFound)
{
[outString deleteCharactersInRange:r];
}
}
}
return outString;
}
@end
2# Then just import the .h file of the category class you've just created e.g.
#import "NSString+NAME_OF_CATEGORY.h"
3# Calling the Method.
NSString* sub = [result stringByStrippingHTML];
NSLog(@"%@", sub);
result is NSString I want to strip the tags from.
Take a look at NSXMLParser. It's a SAX-style parser. You should be able to use it to detect tags or other unwanted elements in the XML document and ignore them, capturing only pure text.