Wondering if there is an easy way to do a simple HTML escape/unescape in Objective C. What I want is something like this psuedo code:
NSString *string = @\"
MREntitiesConverter doesn't work for escaping malformed xml. It will fail on a simple URL:
http://www.google.com/search?client=safari&rls=en&q=fail&ie=UTF-8&oe=UTF-8
This is an old answer that I posted some years ago. My intention was not to provide a "good" and "respectable" solution, but a "hacky" one that might be useful under some circunstances. Please, don't use this solution unless nothing else works.
Actually, it works perfectly fine in many situations that other answers don't because the UIWebView is doing all the work. And you can even inject some javascript (which can be dangerous and/or useful). The performance should be horrible, but actually is not that bad.
There is another solution that has to be mentioned. Just create a UIWebView
, load the encoded string and get the text back. It escapes tags "<>", and also decodes all html entities (e.g. ">") and it might work where other's don't (e.g. using cyrillics). I don't think it's the best solution, but it can be useful if the above solutions doesn't work.
Here is a small example using ARC:
@interface YourClass() <UIWebViewDelegate>
@property UIWebView *webView;
@end
@implementation YourClass
- (void)someMethodWhereYouGetTheHtmlString:(NSString *)htmlString {
self.webView = [[UIWebView alloc] init];
NSString *htmlString = [NSString stringWithFormat:@"<html><body>%@</body></html>", self.description];
[self.webView loadHTMLString:htmlString baseURL:nil];
self.webView.delegate = self;
}
- (void)webView:(UIWebView *)webView didFailLoadWithError:(NSError *)error {
self.webView = nil;
}
- (void)webViewDidFinishLoad:(UIWebView *)webView {
self.webView = nil;
NSString *escapedString = [self.webView stringByEvaluatingJavaScriptFromString:@"document.body.textContent;"];
}
- (void)webViewDidStartLoad:(UIWebView *)webView {
// Do Nothing
}
@end
The least invasive and most lightweight way to encode and decode HTML or XML strings is to use the GTMNSStringHTMLAdditions CocoaPod.
It is simply the Google Toolbox for Mac NSString category GTMNSString+HTML
, stripped of the dependency on GTMDefines.h
. So all you need to add is one .h and one .m, and you're good to go.
Example:
#import "GTMNSString+HTML.h"
// Encoding a string with XML / HTML elements
NSString *stringToEncode = @"<TheBeat>Goes On</TheBeat>";
NSString *encodedString = [stringToEncode gtm_stringByEscapingForHTML];
// encodedString looks like this now:
// <TheBeat>Goes On</TheBeat>
// Decoding a string with XML / HTML encoded elements
NSString *stringToDecode = @"<TheBeat>Goes On</TheBeat>";
NSString *decodedString = [stringToDecode gtm_stringByUnescapingFromHTML];
// decodedString looks like this now:
// <TheBeat>Goes On</TheBeat>
This easiest solution is to create a category as below:
Here’s the category’s header file:
#import <Foundation/Foundation.h>
@interface NSString (URLEncoding)
-(NSString *)urlEncodeUsingEncoding:(NSStringEncoding)encoding;
@end
And here’s the implementation:
#import "NSString+URLEncoding.h"
@implementation NSString (URLEncoding)
-(NSString *)urlEncodeUsingEncoding:(NSStringEncoding)encoding {
return (NSString *)CFURLCreateStringByAddingPercentEscapes(NULL,
(CFStringRef)self,
NULL,
(CFStringRef)@"!*'\"();:@&=+$,/?%#[]% ",
CFStringConvertNSStringEncodingToEncoding(encoding));
}
@end
And now we can simply do this:
NSString *raw = @"hell & brimstone + earthly/delight";
NSString *url = [NSString stringWithFormat:@"http://example.com/example?param=%@",
[raw urlEncodeUsingEncoding:NSUTF8Encoding]];
NSLog(url);
The credits for this answer goes to the website below:-
http://madebymany.com/blog/url-encoding-an-nsstring-on-ios
In iOS 7 you can use NSAttributedString's ability to import HTML to convert HTML entities to an NSString.
Eg:
@interface NSAttributedString (HTML)
+ (instancetype)attributedStringWithHTMLString:(NSString *)htmlString;
@end
@implementation NSAttributedString (HTML)
+ (instancetype)attributedStringWithHTMLString:(NSString *)htmlString
{
NSDictionary *options = @{ NSDocumentTypeDocumentAttribute : NSHTMLTextDocumentType,
NSCharacterEncodingDocumentAttribute :@(NSUTF8StringEncoding) };
NSData *data = [htmlString dataUsingEncoding:NSUTF8StringEncoding];
return [[NSAttributedString alloc] initWithData:data options:options documentAttributes:nil error:nil];
}
@end
Then in your code when you want to clean up the entities:
NSString *cleanString = [[NSAttributedString attributedStringWithHTMLString:question.title] string];
This is probably the simplest way, but I don't know how performant it is. You should probably be pretty damn sure the content your "cleaning" doesn't contain any <img>
tags or stuff like that because this method will download those images during the HTML to NSAttributedString conversion. :)
Check out my NSString category for XMLEntities. There's methods to decode XML entities (including all HTML character references), encode XML entities, stripping tags and removing newlines and whitespace from a string:
- (NSString *)stringByStrippingTags;
- (NSString *)stringByDecodingXMLEntities; // Including all HTML character references
- (NSString *)stringByEncodingXMLEntities;
- (NSString *)stringWithNewLinesAsBRs;
- (NSString *)stringByRemovingNewLinesAndWhitespace;