Remove HTML Tags from an NSString on the iPhone

前端 未结 22 1114
心在旅途
心在旅途 2020-11-22 10:02

There are a couple of different ways to remove HTML tags from an NSString in Cocoa.

One way is to render the string into an

相关标签:
22条回答
  • 2020-11-22 10:35

    This NSString category uses the NSXMLParser to accurately remove any HTML tags from an NSString. This is a single .m and .h file that can be included into your project easily.

    https://gist.github.com/leighmcculloch/1202238

    You then strip html by doing the following:

    Import the header:

    #import "NSString_stripHtml.h"
    

    And then call stripHtml:

    NSString* mystring = @"<b>Hello</b> World!!";
    NSString* stripped = [mystring stripHtml];
    // stripped will be = Hello World!!
    

    This also works with malformed HTML that technically isn't XML.

    0 讨论(0)
  • 2020-11-22 10:36
    UITextView *textview= [[UITextView alloc]initWithFrame:CGRectMake(10, 130, 250, 170)];
    NSString *str = @"This is <font color='red'>simple</font>";
    [textview setValue:str forKey:@"contentToHTMLString"];
    textview.textAlignment = NSTextAlignmentLeft;
    textview.editable = NO;
    textview.font = [UIFont fontWithName:@"vardana" size:20.0];
    [UIView addSubview:textview];
    

    work fine for me

    0 讨论(0)
  • 2020-11-22 10:36

    Without a loop (at least on our side) :

    - (NSString *)removeHTML {
    
        static NSRegularExpression *regexp;
        static dispatch_once_t onceToken;
        dispatch_once(&onceToken, ^{
            regexp = [NSRegularExpression regularExpressionWithPattern:@"<[^>]+>" options:kNilOptions error:nil];
        });
    
        return [regexp stringByReplacingMatchesInString:self
                                                options:kNilOptions
                                                  range:NSMakeRange(0, self.length)
                                           withTemplate:@""];
    }
    
    0 讨论(0)
  • 2020-11-22 10:36

    If you are willing to use Three20 framework, it has a category on NSString that adds stringByRemovingHTMLTags method. See NSStringAdditions.h in Three20Core subproject.

    0 讨论(0)
  • 2020-11-22 10:37

    Here's a more efficient solution than the accepted answer:

    - (NSString*)hp_stringByRemovingTags
    {
        static NSRegularExpression *regex = nil;
        static dispatch_once_t onceToken;
        dispatch_once(&onceToken, ^{
            regex = [NSRegularExpression regularExpressionWithPattern:@"<[^>]+>" options:kNilOptions error:nil];
        });
    
        // Use reverse enumerator to delete characters without affecting indexes
        NSArray *matches =[regex matchesInString:self options:kNilOptions range:NSMakeRange(0, self.length)];
        NSEnumerator *enumerator = matches.reverseObjectEnumerator;
    
        NSTextCheckingResult *match = nil;
        NSMutableString *modifiedString = self.mutableCopy;
        while ((match = [enumerator nextObject]))
        {
            [modifiedString deleteCharactersInRange:match.range];
        }
        return modifiedString;
    }
    

    The above NSString category uses a regular expression to find all the matching tags, makes a copy of the original string and finally removes all the tags in place by iterating over them in reverse order. It's more efficient because:

    • The regular expression is initialised only once.
    • A single copy of the original string is used.

    This performed well enough for me but a solution using NSScanner might be more efficient.

    Like the accepted answer, this solution doesn't address all the border cases requested by @lfalin. Those would be require much more expensive parsing which the average use case most likely doesn't need.

    0 讨论(0)
  • 2020-11-22 10:37

    I have following the accepted answer by m.kocikowski and modified is slightly to make use of an autoreleasepool to cleanup all of the temporary strings that are created by stringByReplacingCharactersInRange

    In the comment for this method it states, /* Replace characters in range with the specified string, returning new string. */

    So, depending on the length of your XML you may be creating a huge pile of new autorelease strings which are not cleaned up until the end of the next @autoreleasepool. If you are unsure when that may happen or if a user action could repeatedly trigger many calls to this method before then you can just wrap this up in an @autoreleasepool. These can even be nested and used within loops where possible.

    Apple's reference on @autoreleasepool states this... "If you write a loop that creates many temporary objects. You may use an autorelease pool block inside the loop to dispose of those objects before the next iteration. Using an autorelease pool block in the loop helps to reduce the maximum memory footprint of the application." I have not used it in the loop, but at least this method cleans up after itself now.

    - (NSString *) stringByStrippingHTML {
        NSString *retVal;
        @autoreleasepool {
            NSRange r;
            NSString *s = [[self copy] autorelease];
            while ((r = [s rangeOfString:@"<[^>]+>" options:NSRegularExpressionSearch]).location != NSNotFound) {
                s = [s stringByReplacingCharactersInRange:r withString:@""];
            }
            retVal = [s copy];
        } 
        // pool is drained, release s and all temp 
        // strings created by stringByReplacingCharactersInRange
        return retVal;
    }
    
    0 讨论(0)
提交回复
热议问题