What characters are in whitespaceAndNewlineCharacterSet()?

前端 未结 2 1326
忘掉有多难
忘掉有多难 2021-01-29 03:24

I\'m parsing some nasty files - you know, mix comma, space and tab delimiters in a single line, and then run it through a text editor that word wraps at column 65 with

相关标签:
2条回答
  • 2021-01-29 03:49

    The ~ means "thru"; thus, U000A, B, C, and D.

    The phrase "General Category Z*" is shorthand for "any character whose General Category property is one of the three categories that start with Z." Thus, various forms of space (0020, 00A0, 1680, 2000 thru 200A, 202F, 205F, 3000), plus the line separator (2028) and the paragraph separator (2029).

    0 讨论(0)
  • 2021-01-29 03:52

    NSCharacterSet is an opaque class that does not expose its content easily. You have to see it more as a "membership" rule service than a list of characters.

    This may be a somewhat brutal approach, but you can get the list of members in an NSCharacterSet by going through all 16 bit scalar values and checking for membership in the set:

     let charSet = NSCharacterSet.whitespaceAndNewlineCharacterSet()
     for i in 0..<65536
     {
        let u:UInt16 = UInt16(i)
        if charSet.characterIsMember(u)
        { print("\(u): \(Character(UnicodeScalar(u)))") }
     }
    

    This gives surprising results for non-displayable character sets but it can probably answer your question.

    0 讨论(0)
提交回复
热议问题