Swift 4.0 update
String received lots of revisions in Swift 4 update, as documented in SE-0163. Two emoji are used for this demo representing two different structures. Both are combined with a sequence of emoji.
This has to do with how the String
type works in Swift, and how the contains(_:)
method works.
The '
The other answers discuss what Swift does, but don't go into much detail about why.
Do you expect “Å” to equal “Å”? I expect you would.
One of these is a letter with a combiner, the other is a single composed character. You can add many different combiners to a base character, and a human would still consider it to be a single character. To deal with this sort of discrepancy the concept of a grapheme was created to represent what a human would consider a character regardless of the codepoints used.
Now text messaging services have been combining characters into graphical emoji for years :)
→
The first problem is you're bridging to Foundation with contains
(Swift's String
is not a Collection
), so this is NSString
behavior, which I don't believe handles composed Emoji as powerfully as Swift. That said, Swift I believe is implementing Unicode 8 right now, which also needed revision around this situation in Unicode 10 (so this may all change when they implement Unicode 10; I haven't dug into whether it will or not).
To simplify thing, let's get rid of Foundation, and use Swift, which provides views that are more explicit. We'll start with characters:
"
Emojis, much like the unicode standard, are deceptively complicated. Skin tones, genders, jobs, groups of people, zero-width joiner sequences, flags (2 character unicode) and other complications can make emoji parsing messy. A Christmas Tree, a Slice of Pizza, or a Pile of Poop can all be represented with a single Unicode code point. Not to mention that when new emojis are introduced, there is a delay between iOS support and emoji release. That and the fact that different versions of iOS support different versions of the unicode standard.
TL;DR. I have worked on these features and opened sourced a library I am the author for JKEmoji to help parse strings with emojis. It makes parsing as easy as:
print("I love these emojis
It seems that Swift considers a ZWJ
to be an extended grapheme cluster with the character immediately preceding it. We can see this when mapping the array of characters to their unicodeScalars
:
Array(manual.characters).map { $0.description.unicodeScalars }
This prints the following from LLDB:
▿ 4 elements
▿ 0 : StringUnicodeScalarView("