Swift Regex matching fails when source contains unicode characters

孤者浪人 提交于 2019-12-07 08:39:08

问题


I'm trying to do a simple regex match using NSRegularExpression, but I'm having some problems matching the string when the source contains multibyte characters:

let string = "D 9"

// The following matches (any characters)(SPACE)(numbers)(any characters)
let pattern = "([\\s\\S]*) ([0-9]*)(.*)"

let slen : Int = string.lengthOfBytesUsingEncoding(NSUTF8StringEncoding)

var error: NSError? = nil

var regex = NSRegularExpression(pattern: pattern, options: NSRegularExpressionOptions.DotMatchesLineSeparators, error: &error)

var result = regex?.stringByReplacingMatchesInString(string, options: nil, range: NSRange(location:0,
length:slen), withTemplate: "First \"$1\" Second: \"$2\"")

The code above returns "D" and "9" as expected

If I now change the first line to include a UK 'Pound' currency symbol as follows:

let string = "£ 9"

Then the match doesn't work, even though the ([\\s\\S]*) part of the expression should still match any leading characters.

I understand that the £ symbol will take two bytes but the wildcard leading match should ignore those shouldn't it?

Can anyone explain what is going on here please?


回答1:


It can be confusing. The first parameter of stringByReplacingMatchesInString() is mapped from NSString in Objective-C to String in Swift, but the range: parameter is still an NSRange. Therefore you have to specify the range in the units used by NSString (which is the number of UTF-16 code points):

var result = regex?.stringByReplacingMatchesInString(string,
        options: nil,
        range: NSRange(location:0, length:(string as NSString).length),
        withTemplate: "First \"$1\" Second: \"$2\"")

Alternatively you can use count(string.utf16) instead of (string as NSString).length .

Full example:

let string = "£ 9"

let pattern = "([\\s\\S]*) ([0-9]*)(.*)"
var error: NSError? = nil
let regex = NSRegularExpression(pattern: pattern,
        options: NSRegularExpressionOptions.DotMatchesLineSeparators,
        error: &error)!

let result = regex.stringByReplacingMatchesInString(string,
    options: nil,
    range: NSRange(location:0, length:(string as NSString).length),
    withTemplate: "First \"$1\" Second: \"$2\"")
println(result)
// First "£" Second: "9"


来源:https://stackoverflow.com/questions/29756530/swift-regex-matching-fails-when-source-contains-unicode-characters

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!