问题
I'm trying to parse out "@mentions" from a user provided string. The regular expression itself seems to find them, but the range it provides is incorrect when emoji are present.
let text = "😂😘🙂 @joe "
let tagExpr = try? NSRegularExpression(pattern: "@\\S+")
tagExpr?.enumerateMatches(in: text, range: NSRange(location: 0, length: text.characters.count)) { tag, flags, pointer in
guard let tag = tag?.range else { return }
if let newRange = Range(tag, in: text) {
let replaced = text.replacingCharacters(in: newRange, with: "[email]")
print(replaced)
}
}
When running this
tag
= (location: 7, length: 2)
And prints out
😂😘🙂 [email]oe
The expected result is
😂😘🙂 [email]
回答1:
NSRegularExpression
(and anything involving NSRange
) operates on UTF16 counts / indexes. For that matter, NSString.count
is the UTF16 count as well.
But in your code, you're telling NSRegularExpression
to use a length of text.characters.count
. This is the number of composed characters, not the UTF16 count. Your string "😂😘🙂 @joe "
has 9 composed characters, but 12 UTF16 code units. So you're actually telling NSRegularExpression
to only look at the first 9 UTF16 code units, which means it's ignoring the trailing "oe "
.
The fix is to pass length: text.utf16.count
.
let text = "😂😘🙂 @joe "
let tagExpr = try? NSRegularExpression(pattern: "@\\S+")
tagExpr?.enumerateMatches(in: text, range: NSRange(location: 0, length: text.utf16.count)) { tag, flags, pointer in
guard let tag = tag?.range else { return }
if let newRange = Range(tag, in: text) {
let replaced = text.replacingCharacters(in: newRange, with: "[email]")
print(replaced)
}
}
来源:https://stackoverflow.com/questions/46495365/using-nsregularexpression-produces-incorrect-ranges-when-emoji-are-present