I have a UITextView and I need to detect if a user enters an emoji character.
I would think that just checking the unicode value of the newest character would suffic
Swift's String type has a property .isEmoji
Best to check the documentation for the isEmojiPresentation caveat
https://developer.apple.com/documentation/swift/unicode/scalar/properties/3081577-isemoji
Over the years these emoji-detecting solutions keep breaking as Apple adds new emojis w/ new methods (like skin-toned emojis built by pre-cursing a character with an additional character), etc.
I finally broke down and just wrote the following method which works for all current emojis and should work for all future emojis.
The solution creates a UILabel with the character and a black background. CG then takes a snapshot of the label and I scan all pixels in the snapshot for any non solid-black pixels. The reason I add the black background is to avoid issues of false-coloring due to Subpixel Rendering
The solution runs VERY fast on my device, I can check hundreds of characters a second, but it should be noted that this is a CoreGraphics solution and should not be used heavily like you could with a regular text method. Graphics processing is data heavy so checking thousands of characters at once could result in noticeable lag.
-(BOOL)isEmoji:(NSString *)character {
UILabel *characterRender = [[UILabel alloc] initWithFrame:CGRectMake(0, 0, 1, 1)];
characterRender.text = character;
characterRender.backgroundColor = [UIColor blackColor];//needed to remove subpixel rendering colors
[characterRender sizeToFit];
CGRect rect = [characterRender bounds];
UIGraphicsBeginImageContextWithOptions(rect.size,YES,0.0f);
CGContextRef contextSnap = UIGraphicsGetCurrentContext();
[characterRender.layer renderInContext:contextSnap];
UIImage *capturedImage = UIGraphicsGetImageFromCurrentImageContext();
UIGraphicsEndImageContext();
CGImageRef imageRef = [capturedImage CGImage];
NSUInteger width = CGImageGetWidth(imageRef);
NSUInteger height = CGImageGetHeight(imageRef);
CGColorSpaceRef colorSpace = CGColorSpaceCreateDeviceRGB();
unsigned char *rawData = (unsigned char*) calloc(height * width * 4, sizeof(unsigned char));
NSUInteger bytesPerPixel = 4;
NSUInteger bytesPerRow = bytesPerPixel * width;
NSUInteger bitsPerComponent = 8;
CGContextRef context = CGBitmapContextCreate(rawData, width, height,
bitsPerComponent, bytesPerRow, colorSpace,
kCGImageAlphaPremultipliedLast | kCGBitmapByteOrder32Big);
CGColorSpaceRelease(colorSpace);
CGContextDrawImage(context, CGRectMake(0, 0, width, height), imageRef);
CGContextRelease(context);
BOOL colorPixelFound = NO;
int x = 0;
int y = 0;
while (y < height && !colorPixelFound) {
while (x < width && !colorPixelFound) {
NSUInteger byteIndex = (bytesPerRow * y) + x * bytesPerPixel;
CGFloat red = (CGFloat)rawData[byteIndex];
CGFloat green = (CGFloat)rawData[byteIndex+1];
CGFloat blue = (CGFloat)rawData[byteIndex+2];
CGFloat h, s, b, a;
UIColor *c = [UIColor colorWithRed:red green:green blue:blue alpha:1.0f];
[c getHue:&h saturation:&s brightness:&b alpha:&a];
b /= 255.0f;
if (b > 0) {
colorPixelFound = YES;
}
x++;
}
x=0;
y++;
}
return colorPixelFound;
}
First let's address your "55357 method" – and why it works for many emoji characters.
In Cocoa, an NSString
is a collection of unichar
s, and unichar
is just a typealias for unsigned short
which is the same as UInt16
. Since the maximum value of UInt16
is 0xffff
, this rules out quite a few emoji from being able to fit into one unichar
, as only two out of the six main Unicode blocks used for emoji fall under this range:
These blocks contain 113 emoji, and an additional 66 emoji that can be represented as a single unichar
can be found spread around various other blocks. However, these 179 characters only represent a fraction of the 1126 emoji base characters, the rest of which must be represented by more than one unichar
.
Let's analyse your code:
unichar unicodevalue = [text characterAtIndex:0];
What's happening is that you're simply taking the first unichar
of the string, and while this works for the previously mentioned 179 characters, it breaks apart when you encounter a UTF-32 character, since NSString
converts everything into UTF-16 encoding. The conversion works by substituting the UTF-32 value with surrogate pairs, which means that the NSString
now contains two unichar
s.
And now we're getting to why the number 55357, or 0xd83d
, appears for many emoji: when you only look at the first UTF-16 value of a UTF-32 character you get the high surrogate, each of which have a span of 1024 low surrogates. The range for the high surrogate 0xd83d
is U+1F400–U+1F7FF, which starts in the middle of the largest emoji block, Miscellaneous Symbols and Pictographs (U+1F300–U+1F5FF), and continues all the way up to Geometric Shapes Extended (U+1F780–U+1F7FF) – containing a total of 563 emoji, and 333 non-emoji characters within this range.
So, an impressive 50% of emoji base characters have the the high surrogate 0xd83d
, but these deduction methods still leave 384 emoji characters unhandled, along with giving false positives for at least as many.
I recently answered a somewhat related question with a Swift implementation, and if you want to, you can look at how emoji are detected in this framework, which I created for the purpose of replacing standard emoji with custom images.
Anyhow, what you can do is extract the UTF-32 code point from the characters, which we'll do according to the specification:
- (BOOL)textView:(UITextView *)textView shouldChangeTextInRange:(NSRange)range replacementText:(NSString *)text {
// Get the UTF-16 representation of the text.
unsigned long length = text.length;
unichar buffer[length];
[text getCharacters:buffer];
// Initialize array to hold our UTF-32 values.
NSMutableArray *array = [[NSMutableArray alloc] init];
// Temporary stores for the UTF-32 and UTF-16 values.
UTF32Char utf32 = 0;
UTF16Char h16 = 0, l16 = 0;
for (int i = 0; i < length; i++) {
unichar surrogate = buffer[i];
// High surrogate.
if (0xd800 <= surrogate && surrogate <= 0xd83f) {
h16 = surrogate;
continue;
}
// Low surrogate.
else if (0xdc00 <= surrogate && surrogate <= 0xdfff) {
l16 = surrogate;
// Convert surrogate pair to UTF-32 encoding.
utf32 = ((h16 - 0xd800) << 10) + (l16 - 0xdc00) + 0x10000;
}
// Normal UTF-16.
else {
utf32 = surrogate;
}
// Add UTF-32 value to array.
[array addObject:[NSNumber numberWithUnsignedInteger:utf32]];
}
NSLog(@"%@ contains values:", text);
for (int i = 0; i < array.count; i++) {
UTF32Char character = (UTF32Char)[[array objectAtIndex:i] unsignedIntegerValue];
NSLog(@"\t- U+%x", character);
}
return YES;
}
Typing "