问题
XKCD has some issues with their API and weird encoding issues.
Minor encoding issue with xkcd alt texts in chat
The solution (in Python) is to encode it as latin1 then decode as utf8, but how do I do this in Swift?
Test string:
"Be careful\u00e2\u0080\u0094it's breeding season"
Expected output:
Be careful—it's breeding season
Python (from above link):
import json
a = '''"Be careful\u00e2\u0080\u0094it's breeding season"'''
print(json.loads(a).encode('latin1').decode('utf8'))
How is this done in Swift?
let strdata = "Be careful\\u00e2\\u0080\\u0094it's breeding season".data(using: .isoLatin1)!
let str = String(data: strdata, encoding: .utf8)
That doesn't work!
回答1:
You have to decode the JSON data first, then extract the string, and finally “fix” the string. Here is a self-contained example with the JSON from https://xkcd.com/1814/info.0.json:
let data = """
{"month": "3", "num": 1814, "link": "", "year": "2017", "news": "",
"safe_title": "Color Pattern", "transcript": "",
"alt": "\\u00e2\\u0099\\u00ab When the spacing is tight / And the difference is slight / That's a moir\\u00c3\\u00a9 \\u00e2\\u0099\\u00ab",
"img": "https://imgs.xkcd.com/comics/color_pattern.png",
"title": "Color Pattern", "day": "22"}
""".data(using: .utf8)!
// Alternatively:
// let url = URL(string: "https://xkcd.com/1814/info.0.json")!
// let data = try! Data(contentsOf: url)
do {
if let dict = (try JSONSerialization.jsonObject(with: data, options: [])) as? [String: Any],
var alt = dict["alt"] as? String {
// Now try fix the "alt" string
if let isoData = alt.data(using: .isoLatin1),
let altFixed = String(data: isoData, encoding: .utf8) {
alt = altFixed
}
print(alt)
// ♫ When the spacing is tight / And the difference is slight / That's a moiré ♫
}
} catch {
print(error)
}
If you have just a string of the form
Be careful\u00e2\u0080\u0094it's breeding season
then you can still use JSONSerialization
to decode the \uNNNN
escape sequences, and then continue as above.
A simple example (error checking omitted for brevity):
let strbad = "Be careful\\u00e2\\u0080\\u0094it's breeding season"
let decoded = try! JSONSerialization.jsonObject(with: Data("\"\(strbad)\"".utf8), options: .allowFragments) as! String
let strgood = String(data: decoded.data(using: .isoLatin1)!, encoding: .utf8)!
print(strgood)
// Be careful—it's breeding season
回答2:
I couldn't find anything built in, but I did manage to write this for you.
extension String {
func range(nsRange: NSRange) -> Range<Index> {
return Range(nsRange, in: self)!
}
func nsRange(range: Range<Index>) -> NSRange {
return NSRange(range, in: self)
}
var fullRange: Range<Index> {
return startIndex..<endIndex
}
var fullNSRange: NSRange {
return nsRange(range: fullRange)
}
subscript(nsRange: NSRange) -> Substring {
return self[range(nsRange: nsRange)]
}
func convertingUnicodeCharacters() -> String {
var string = self
// Characters need to be replaced in groups in case of clusters
let groupedRegex = try! NSRegularExpression(pattern: "(\\\\u[0-9a-fA-F]{1,8})+")
for match in groupedRegex.matches(in: string, range: string.fullNSRange).reversed() {
let groupedHexValues = String(string[match.range])
var characters = [Character]()
let regex = try! NSRegularExpression(pattern: "\\\\u([0-9a-fA-F]{1,8})")
for hexMatch in regex.matches(in: groupedHexValues, range: groupedHexValues.fullNSRange) {
let hexString = groupedHexValues[Range(hexMatch.range(at: 1), in: string)!]
if let hexValue = UInt32(hexString, radix: 16),
let scalar = UnicodeScalar(hexValue) {
characters.append(Character(scalar))
}
}
string.replaceSubrange(Range(match.range, in: string)!, with: characters)
}
return string
}
}
It basically looks for any \u<1-8 digit hex>
values and converts them into scalars. Should be fairly straightforward... 🧐 I've tried to test it a fair but but not sure if it catches every edge case.
My playground testing code was simply:
let string = "Be careful\\u00e2\\u0080\\u0094-\\u1F496\\u65\\u301it's breeding season"
let expected = "Be careful\u{00e2}\u{0080}\u{0094}-\u{1f496}\u{65}\u{301}it's breeding season"
string.convertingUnicodeCharacters() == expected // true 🎉
来源:https://stackoverflow.com/questions/52387450/using-swift-how-do-you-re-encode-then-decode-a-string-like-this-short-script-in