I'm having trouble getting NSRegularExpression
to match patterns on strings with wider (?) Unicode characters in them. It looks like the problem is the range parameter -- Swift counts individual Unicode characters, while Objective-C treats strings as if they're made up of UTF-16 code units.
Here is my test string and two regular expressions:
let str = "dog🐶🐮cow"
let dogRegex = NSRegularExpression(pattern: "d.g", options: nil, error: nil)!
let cowRegex = NSRegularExpression(pattern: "c.w", options: nil, error: nil)!
I can match the first regex with no problems:
let dogMatch = dogRegex.firstMatchInString(str, options: nil,
range: NSRange(location: 0, length: countElements(str)))
println(dogMatch?.range) // (0, 3)
But the second fails with the same parameters, because the range I send it (0...7) isn't long enough to cover the whole string as far as NSRegularExpression
is concerned:
let cowMatch = cowRegex.firstMatchInString(str, options: nil,
range: NSRange(location: 0, length: countElements(str)))
println(cowMatch.range) // nil
If I use a different range I can make the match succeed:
let cowMatch2 = cowRegex.firstMatchInString(str, options: nil,
range: NSRange(location: 0, length: str.utf16Count))
println(cowMatch2?.range) // (7, 3)
but then I don't know how to extract the matched text out of the string, since that range falls outside the range of the Swift string.
Turns out you can fight fire with fire. Using the Swift-native string's utf16Count
property and the substringWithRange:
method of NSString
-- not String
-- gets the right result. Here's the full working code:
let str = "dog🐶🐮cow"
let cowRegex = NSRegularExpression(pattern: "c.w", options: nil, error: nil)!
if let cowMatch = cowRegex.firstMatchInString(str, options: nil,
range: NSRange(location: 0, length: str.utf16Count)) {
println((str as NSString).substringWithRange(cowMatch.range))
// prints "cow"
}
(I figured this out in the process of writing the question; score one for rubber duck debugging.)
来源:https://stackoverflow.com/questions/25882503/how-can-i-use-nsregularexpression-on-swift-strings-with-variable-width-unicode-c