Difference between unicode.isdigit() and unicode.isnumeric()

后端 未结 5 1892
孤城傲影
孤城傲影 2020-12-31 00:37

What is the difference between methods unicode.isdigit() and unicode.isnumeric()?

相关标签:
5条回答
  • 2020-12-31 01:11

    The code snippet provided by @Martijn Pieters doesn't work on the latest Python version i.e. 3.7 at the time of writing this answer.

    Here is the updated code snippet.

    import unicodedata
    
    count = 0
    for codepoint in range(2**16):
        ch = chr(codepoint)
        if ch.isnumeric() and not ch.isdigit():
            print(u'{:04x}: {} ({})'.format(codepoint, ch, unicodedata.name(ch, 'UNNAMED')))
            count = count + 1
    print(f'Total Number of Numeric and Non-Digit Unicode Characters = {count}')
    

    Output:

    ...
    f9d1: 六 (CJK COMPATIBILITY IDEOGRAPH-F9D1)
    f9d3: 陸 (CJK COMPATIBILITY IDEOGRAPH-F9D3)
    f9fd: 什 (CJK COMPATIBILITY IDEOGRAPH-F9FD)
    Total Number of Numeric and Non-Digit Unicode Characters = 335
    

    NOTE: I am using f-strings for formatting. It's a really cool new way to format string and introduced in Python 3.6 under PEP-498. It's also called Literal String Interpolation. You can read more about it here or check out Official Documentation too.

    0 讨论(0)
  • 2020-12-31 01:25
    unicode.isnumeric()
    

    Return True if there are only numeric characters in S, False otherwise. Numeric characters include digit characters, and all characters that have the Unicode numeric value property, e.g. U+2155, VULGAR FRACTION ONE FIFTH.

    str.isdigit()
    

    Return true if all characters in the string are digits and there is at least one character, false otherwise.

    For 8-bit strings, this method is locale-dependent.

    0 讨论(0)
  • 2020-12-31 01:26

    The Python 3 documentation is a little clearer than the Python 2 docs:

    str.isdigit()
    [...] Digits include decimal characters and digits that need special handling, such as the compatibility superscript digits. Formally, a digit is a character that has the property value Numeric_Type=Digit or Numeric_Type=Decimal.

    str.isnumeric()
    Numeric characters include digit characters, and all characters that have the Unicode numeric value property, e.g. U+2155, VULGAR FRACTION ONE FIFTH. Formally, numeric characters are those with the property value Numeric_Type=Digit, Numeric_Type=Decimal or Numeric_Type=Numeric.

    So isnumeric() tests additionally for Numeric_Type=Numeric. Quoting from a historic proposal for official numeric type definitions:

    Numeric_Type=Decimal
    Characters used in a positional decimal systems, which standard base-10 radix systems with contiguous digits 0..9, and are most-significant-digit first (backingstore order). These are coextensive by definition with General_Category=Decimal_Number.

    Numeric_Type=Digit Variants of positional decimal characters (Numeric_Type=Decimal) or sequences thereof. These include super/subscripts, enclosed, or decorated by the addition of characters such as parentheses, dots, or commas.

    Numeric_Type=Numeric Characters with numeric value, but that are neither Decimal nor Digit.

    So any character that is numeric, but not decimal or a variation thereof. Think fractions, roman numerals, glyphs that combine digits, and any numbering system that is not decimal-based.

    That includes:

    >>> import unicodedata
    >>> for codepoint in range(2**16):
    ...     chr = unichr(codepoint)
    ...     if chr.isnumeric() and not chr.isdigit():
    ...         print u'{:04x}: {} ({})'.format(codepoint, chr, unicodedata.name(chr, 'UNNAMED'))
    ... 
    00bc: ¼ (VULGAR FRACTION ONE QUARTER)
    00bd: ½ (VULGAR FRACTION ONE HALF)
    00be: ¾ (VULGAR FRACTION THREE QUARTERS)
    09f4: ৴ (BENGALI CURRENCY NUMERATOR ONE)
    09f5: ৵ (BENGALI CURRENCY NUMERATOR TWO)
    09f6: ৶ (BENGALI CURRENCY NUMERATOR THREE)
    09f7: ৷ (BENGALI CURRENCY NUMERATOR FOUR)
    09f8: ৸ (BENGALI CURRENCY NUMERATOR ONE LESS THAN THE DENOMINATOR)
    09f9: ৹ (BENGALI CURRENCY DENOMINATOR SIXTEEN)
    0bf0: ௰ (TAMIL NUMBER TEN)
    0bf1: ௱ (TAMIL NUMBER ONE HUNDRED)
    0bf2: ௲ (TAMIL NUMBER ONE THOUSAND)
    0c78: ౸ (TELUGU FRACTION DIGIT ZERO FOR ODD POWERS OF FOUR)
    0c79: ౹ (TELUGU FRACTION DIGIT ONE FOR ODD POWERS OF FOUR)
    0c7a: ౺ (TELUGU FRACTION DIGIT TWO FOR ODD POWERS OF FOUR)
    0c7b: ౻ (TELUGU FRACTION DIGIT THREE FOR ODD POWERS OF FOUR)
    0c7c: ౼ (TELUGU FRACTION DIGIT ONE FOR EVEN POWERS OF FOUR)
    0c7d: ౽ (TELUGU FRACTION DIGIT TWO FOR EVEN POWERS OF FOUR)
    0c7e: ౾ (TELUGU FRACTION DIGIT THREE FOR EVEN POWERS OF FOUR)
    0d70: ൰ (MALAYALAM NUMBER TEN)
    0d71: ൱ (MALAYALAM NUMBER ONE HUNDRED)
    0d72: ൲ (MALAYALAM NUMBER ONE THOUSAND)
    0d73: ൳ (MALAYALAM FRACTION ONE QUARTER)
    0d74: ൴ (MALAYALAM FRACTION ONE HALF)
    0d75: ൵ (MALAYALAM FRACTION THREE QUARTERS)
    0f2a: ༪ (TIBETAN DIGIT HALF ONE)
    0f2b: ༫ (TIBETAN DIGIT HALF TWO)
    0f2c: ༬ (TIBETAN DIGIT HALF THREE)
    0f2d: ༭ (TIBETAN DIGIT HALF FOUR)
    0f2e: ༮ (TIBETAN DIGIT HALF FIVE)
    0f2f: ༯ (TIBETAN DIGIT HALF SIX)
    0f30: ༰ (TIBETAN DIGIT HALF SEVEN)
    0f31: ༱ (TIBETAN DIGIT HALF EIGHT)
    0f32: ༲ (TIBETAN DIGIT HALF NINE)
    0f33: ༳ (TIBETAN DIGIT HALF ZERO)
    1372: ፲ (ETHIOPIC NUMBER TEN)
    1373: ፳ (ETHIOPIC NUMBER TWENTY)
    1374: ፴ (ETHIOPIC NUMBER THIRTY)
    1375: ፵ (ETHIOPIC NUMBER FORTY)
    1376: ፶ (ETHIOPIC NUMBER FIFTY)
    1377: ፷ (ETHIOPIC NUMBER SIXTY)
    1378: ፸ (ETHIOPIC NUMBER SEVENTY)
    1379: ፹ (ETHIOPIC NUMBER EIGHTY)
    137a: ፺ (ETHIOPIC NUMBER NINETY)
    137b: ፻ (ETHIOPIC NUMBER HUNDRED)
    137c: ፼ (ETHIOPIC NUMBER TEN THOUSAND)
    16ee: ᛮ (RUNIC ARLAUG SYMBOL)
    16ef: ᛯ (RUNIC TVIMADUR SYMBOL)
    16f0: ᛰ (RUNIC BELGTHOR SYMBOL)
    17f0: ៰ (KHMER SYMBOL LEK ATTAK SON)
    17f1: ៱ (KHMER SYMBOL LEK ATTAK MUOY)
    17f2: ៲ (KHMER SYMBOL LEK ATTAK PII)
    17f3: ៳ (KHMER SYMBOL LEK ATTAK BEI)
    17f4: ៴ (KHMER SYMBOL LEK ATTAK BUON)
    17f5: ៵ (KHMER SYMBOL LEK ATTAK PRAM)
    17f6: ៶ (KHMER SYMBOL LEK ATTAK PRAM-MUOY)
    17f7: ៷ (KHMER SYMBOL LEK ATTAK PRAM-PII)
    17f8: ៸ (KHMER SYMBOL LEK ATTAK PRAM-BEI)
    17f9: ៹ (KHMER SYMBOL LEK ATTAK PRAM-BUON)
    2150: ⅐ (VULGAR FRACTION ONE SEVENTH)
    2151: ⅑ (VULGAR FRACTION ONE NINTH)
    2152: ⅒ (VULGAR FRACTION ONE TENTH)
    2153: ⅓ (VULGAR FRACTION ONE THIRD)
    2154: ⅔ (VULGAR FRACTION TWO THIRDS)
    2155: ⅕ (VULGAR FRACTION ONE FIFTH)
    2156: ⅖ (VULGAR FRACTION TWO FIFTHS)
    2157: ⅗ (VULGAR FRACTION THREE FIFTHS)
    2158: ⅘ (VULGAR FRACTION FOUR FIFTHS)
    2159: ⅙ (VULGAR FRACTION ONE SIXTH)
    215a: ⅚ (VULGAR FRACTION FIVE SIXTHS)
    215b: ⅛ (VULGAR FRACTION ONE EIGHTH)
    215c: ⅜ (VULGAR FRACTION THREE EIGHTHS)
    215d: ⅝ (VULGAR FRACTION FIVE EIGHTHS)
    215e: ⅞ (VULGAR FRACTION SEVEN EIGHTHS)
    215f: ⅟ (FRACTION NUMERATOR ONE)
    2160: Ⅰ (ROMAN NUMERAL ONE)
    2161: Ⅱ (ROMAN NUMERAL TWO)
    2162: Ⅲ (ROMAN NUMERAL THREE)
    2163: Ⅳ (ROMAN NUMERAL FOUR)
    2164: Ⅴ (ROMAN NUMERAL FIVE)
    2165: Ⅵ (ROMAN NUMERAL SIX)
    2166: Ⅶ (ROMAN NUMERAL SEVEN)
    2167: Ⅷ (ROMAN NUMERAL EIGHT)
    2168: Ⅸ (ROMAN NUMERAL NINE)
    2169: Ⅹ (ROMAN NUMERAL TEN)
    216a: Ⅺ (ROMAN NUMERAL ELEVEN)
    216b: Ⅻ (ROMAN NUMERAL TWELVE)
    216c: Ⅼ (ROMAN NUMERAL FIFTY)
    216d: Ⅽ (ROMAN NUMERAL ONE HUNDRED)
    216e: Ⅾ (ROMAN NUMERAL FIVE HUNDRED)
    216f: Ⅿ (ROMAN NUMERAL ONE THOUSAND)
    2170: ⅰ (SMALL ROMAN NUMERAL ONE)
    2171: ⅱ (SMALL ROMAN NUMERAL TWO)
    2172: ⅲ (SMALL ROMAN NUMERAL THREE)
    2173: ⅳ (SMALL ROMAN NUMERAL FOUR)
    2174: ⅴ (SMALL ROMAN NUMERAL FIVE)
    2175: ⅵ (SMALL ROMAN NUMERAL SIX)
    2176: ⅶ (SMALL ROMAN NUMERAL SEVEN)
    2177: ⅷ (SMALL ROMAN NUMERAL EIGHT)
    2178: ⅸ (SMALL ROMAN NUMERAL NINE)
    2179: ⅹ (SMALL ROMAN NUMERAL TEN)
    217a: ⅺ (SMALL ROMAN NUMERAL ELEVEN)
    217b: ⅻ (SMALL ROMAN NUMERAL TWELVE)
    217c: ⅼ (SMALL ROMAN NUMERAL FIFTY)
    217d: ⅽ (SMALL ROMAN NUMERAL ONE HUNDRED)
    217e: ⅾ (SMALL ROMAN NUMERAL FIVE HUNDRED)
    217f: ⅿ (SMALL ROMAN NUMERAL ONE THOUSAND)
    2180: ↀ (ROMAN NUMERAL ONE THOUSAND C D)
    2181: ↁ (ROMAN NUMERAL FIVE THOUSAND)
    2182: ↂ (ROMAN NUMERAL TEN THOUSAND)
    2185: ↅ (ROMAN NUMERAL SIX LATE FORM)
    2186: ↆ (ROMAN NUMERAL FIFTY EARLY FORM)
    2187: ↇ (ROMAN NUMERAL FIFTY THOUSAND)
    2188: ↈ (ROMAN NUMERAL ONE HUNDRED THOUSAND)
    2189: ↉ (VULGAR FRACTION ZERO THIRDS)
    2469: ⑩ (CIRCLED NUMBER TEN)
    246a: ⑪ (CIRCLED NUMBER ELEVEN)
    246b: ⑫ (CIRCLED NUMBER TWELVE)
    246c: ⑬ (CIRCLED NUMBER THIRTEEN)
    246d: ⑭ (CIRCLED NUMBER FOURTEEN)
    246e: ⑮ (CIRCLED NUMBER FIFTEEN)
    246f: ⑯ (CIRCLED NUMBER SIXTEEN)
    2470: ⑰ (CIRCLED NUMBER SEVENTEEN)
    2471: ⑱ (CIRCLED NUMBER EIGHTEEN)
    2472: ⑲ (CIRCLED NUMBER NINETEEN)
    2473: ⑳ (CIRCLED NUMBER TWENTY)
    247d: ⑽ (PARENTHESIZED NUMBER TEN)
    247e: ⑾ (PARENTHESIZED NUMBER ELEVEN)
    247f: ⑿ (PARENTHESIZED NUMBER TWELVE)
    2480: ⒀ (PARENTHESIZED NUMBER THIRTEEN)
    2481: ⒁ (PARENTHESIZED NUMBER FOURTEEN)
    2482: ⒂ (PARENTHESIZED NUMBER FIFTEEN)
    2483: ⒃ (PARENTHESIZED NUMBER SIXTEEN)
    2484: ⒄ (PARENTHESIZED NUMBER SEVENTEEN)
    2485: ⒅ (PARENTHESIZED NUMBER EIGHTEEN)
    2486: ⒆ (PARENTHESIZED NUMBER NINETEEN)
    2487: ⒇ (PARENTHESIZED NUMBER TWENTY)
    2491: ⒑ (NUMBER TEN FULL STOP)
    2492: ⒒ (NUMBER ELEVEN FULL STOP)
    2493: ⒓ (NUMBER TWELVE FULL STOP)
    2494: ⒔ (NUMBER THIRTEEN FULL STOP)
    2495: ⒕ (NUMBER FOURTEEN FULL STOP)
    2496: ⒖ (NUMBER FIFTEEN FULL STOP)
    2497: ⒗ (NUMBER SIXTEEN FULL STOP)
    2498: ⒘ (NUMBER SEVENTEEN FULL STOP)
    2499: ⒙ (NUMBER EIGHTEEN FULL STOP)
    249a: ⒚ (NUMBER NINETEEN FULL STOP)
    249b: ⒛ (NUMBER TWENTY FULL STOP)
    24eb: ⓫ (NEGATIVE CIRCLED NUMBER ELEVEN)
    24ec: ⓬ (NEGATIVE CIRCLED NUMBER TWELVE)
    24ed: ⓭ (NEGATIVE CIRCLED NUMBER THIRTEEN)
    24ee: ⓮ (NEGATIVE CIRCLED NUMBER FOURTEEN)
    24ef: ⓯ (NEGATIVE CIRCLED NUMBER FIFTEEN)
    24f0: ⓰ (NEGATIVE CIRCLED NUMBER SIXTEEN)
    24f1: ⓱ (NEGATIVE CIRCLED NUMBER SEVENTEEN)
    24f2: ⓲ (NEGATIVE CIRCLED NUMBER EIGHTEEN)
    24f3: ⓳ (NEGATIVE CIRCLED NUMBER NINETEEN)
    24f4: ⓴ (NEGATIVE CIRCLED NUMBER TWENTY)
    24fe: ⓾ (DOUBLE CIRCLED NUMBER TEN)
    277f: ❿ (DINGBAT NEGATIVE CIRCLED NUMBER TEN)
    2789: ➉ (DINGBAT CIRCLED SANS-SERIF NUMBER TEN)
    2793: ➓ (DINGBAT NEGATIVE CIRCLED SANS-SERIF NUMBER TEN)
    2cfd: ⳽ (COPTIC FRACTION ONE HALF)
    3007: 〇 (IDEOGRAPHIC NUMBER ZERO)
    3021: 〡 (HANGZHOU NUMERAL ONE)
    3022: 〢 (HANGZHOU NUMERAL TWO)
    3023: 〣 (HANGZHOU NUMERAL THREE)
    3024: 〤 (HANGZHOU NUMERAL FOUR)
    3025: 〥 (HANGZHOU NUMERAL FIVE)
    3026: 〦 (HANGZHOU NUMERAL SIX)
    3027: 〧 (HANGZHOU NUMERAL SEVEN)
    3028: 〨 (HANGZHOU NUMERAL EIGHT)
    3029: 〩 (HANGZHOU NUMERAL NINE)
    3038: 〸 (HANGZHOU NUMERAL TEN)
    3039: 〹 (HANGZHOU NUMERAL TWENTY)
    303a: 〺 (HANGZHOU NUMERAL THIRTY)
    3192: ㆒ (IDEOGRAPHIC ANNOTATION ONE MARK)
    3193: ㆓ (IDEOGRAPHIC ANNOTATION TWO MARK)
    3194: ㆔ (IDEOGRAPHIC ANNOTATION THREE MARK)
    3195: ㆕ (IDEOGRAPHIC ANNOTATION FOUR MARK)
    3220: ㈠ (PARENTHESIZED IDEOGRAPH ONE)
    3221: ㈡ (PARENTHESIZED IDEOGRAPH TWO)
    3222: ㈢ (PARENTHESIZED IDEOGRAPH THREE)
    3223: ㈣ (PARENTHESIZED IDEOGRAPH FOUR)
    3224: ㈤ (PARENTHESIZED IDEOGRAPH FIVE)
    3225: ㈥ (PARENTHESIZED IDEOGRAPH SIX)
    3226: ㈦ (PARENTHESIZED IDEOGRAPH SEVEN)
    3227: ㈧ (PARENTHESIZED IDEOGRAPH EIGHT)
    3228: ㈨ (PARENTHESIZED IDEOGRAPH NINE)
    3229: ㈩ (PARENTHESIZED IDEOGRAPH TEN)
    3251: ㉑ (CIRCLED NUMBER TWENTY ONE)
    3252: ㉒ (CIRCLED NUMBER TWENTY TWO)
    3253: ㉓ (CIRCLED NUMBER TWENTY THREE)
    3254: ㉔ (CIRCLED NUMBER TWENTY FOUR)
    3255: ㉕ (CIRCLED NUMBER TWENTY FIVE)
    3256: ㉖ (CIRCLED NUMBER TWENTY SIX)
    3257: ㉗ (CIRCLED NUMBER TWENTY SEVEN)
    3258: ㉘ (CIRCLED NUMBER TWENTY EIGHT)
    3259: ㉙ (CIRCLED NUMBER TWENTY NINE)
    325a: ㉚ (CIRCLED NUMBER THIRTY)
    325b: ㉛ (CIRCLED NUMBER THIRTY ONE)
    325c: ㉜ (CIRCLED NUMBER THIRTY TWO)
    325d: ㉝ (CIRCLED NUMBER THIRTY THREE)
    325e: ㉞ (CIRCLED NUMBER THIRTY FOUR)
    325f: ㉟ (CIRCLED NUMBER THIRTY FIVE)
    3280: ㊀ (CIRCLED IDEOGRAPH ONE)
    3281: ㊁ (CIRCLED IDEOGRAPH TWO)
    3282: ㊂ (CIRCLED IDEOGRAPH THREE)
    3283: ㊃ (CIRCLED IDEOGRAPH FOUR)
    3284: ㊄ (CIRCLED IDEOGRAPH FIVE)
    3285: ㊅ (CIRCLED IDEOGRAPH SIX)
    3286: ㊆ (CIRCLED IDEOGRAPH SEVEN)
    3287: ㊇ (CIRCLED IDEOGRAPH EIGHT)
    3288: ㊈ (CIRCLED IDEOGRAPH NINE)
    3289: ㊉ (CIRCLED IDEOGRAPH TEN)
    32b1: ㊱ (CIRCLED NUMBER THIRTY SIX)
    32b2: ㊲ (CIRCLED NUMBER THIRTY SEVEN)
    32b3: ㊳ (CIRCLED NUMBER THIRTY EIGHT)
    32b4: ㊴ (CIRCLED NUMBER THIRTY NINE)
    32b5: ㊵ (CIRCLED NUMBER FORTY)
    32b6: ㊶ (CIRCLED NUMBER FORTY ONE)
    32b7: ㊷ (CIRCLED NUMBER FORTY TWO)
    32b8: ㊸ (CIRCLED NUMBER FORTY THREE)
    32b9: ㊹ (CIRCLED NUMBER FORTY FOUR)
    32ba: ㊺ (CIRCLED NUMBER FORTY FIVE)
    32bb: ㊻ (CIRCLED NUMBER FORTY SIX)
    32bc: ㊼ (CIRCLED NUMBER FORTY SEVEN)
    32bd: ㊽ (CIRCLED NUMBER FORTY EIGHT)
    32be: ㊾ (CIRCLED NUMBER FORTY NINE)
    32bf: ㊿ (CIRCLED NUMBER FIFTY)
    3405: 㐅 (CJK UNIFIED IDEOGRAPH-3405)
    3483: 㒃 (CJK UNIFIED IDEOGRAPH-3483)
    382a: 㠪 (CJK UNIFIED IDEOGRAPH-382A)
    3b4d: 㭍 (CJK UNIFIED IDEOGRAPH-3B4D)
    4e00: 一 (CJK UNIFIED IDEOGRAPH-4E00)
    4e03: 七 (CJK UNIFIED IDEOGRAPH-4E03)
    4e07: 万 (CJK UNIFIED IDEOGRAPH-4E07)
    4e09: 三 (CJK UNIFIED IDEOGRAPH-4E09)
    4e5d: 九 (CJK UNIFIED IDEOGRAPH-4E5D)
    4e8c: 二 (CJK UNIFIED IDEOGRAPH-4E8C)
    4e94: 五 (CJK UNIFIED IDEOGRAPH-4E94)
    4e96: 亖 (CJK UNIFIED IDEOGRAPH-4E96)
    4ebf: 亿 (CJK UNIFIED IDEOGRAPH-4EBF)
    4ec0: 什 (CJK UNIFIED IDEOGRAPH-4EC0)
    4edf: 仟 (CJK UNIFIED IDEOGRAPH-4EDF)
    4ee8: 仨 (CJK UNIFIED IDEOGRAPH-4EE8)
    4f0d: 伍 (CJK UNIFIED IDEOGRAPH-4F0D)
    4f70: 佰 (CJK UNIFIED IDEOGRAPH-4F70)
    5104: 億 (CJK UNIFIED IDEOGRAPH-5104)
    5146: 兆 (CJK UNIFIED IDEOGRAPH-5146)
    5169: 兩 (CJK UNIFIED IDEOGRAPH-5169)
    516b: 八 (CJK UNIFIED IDEOGRAPH-516B)
    516d: 六 (CJK UNIFIED IDEOGRAPH-516D)
    5341: 十 (CJK UNIFIED IDEOGRAPH-5341)
    5343: 千 (CJK UNIFIED IDEOGRAPH-5343)
    5344: 卄 (CJK UNIFIED IDEOGRAPH-5344)
    5345: 卅 (CJK UNIFIED IDEOGRAPH-5345)
    534c: 卌 (CJK UNIFIED IDEOGRAPH-534C)
    53c1: 叁 (CJK UNIFIED IDEOGRAPH-53C1)
    53c2: 参 (CJK UNIFIED IDEOGRAPH-53C2)
    53c3: 參 (CJK UNIFIED IDEOGRAPH-53C3)
    53c4: 叄 (CJK UNIFIED IDEOGRAPH-53C4)
    56db: 四 (CJK UNIFIED IDEOGRAPH-56DB)
    58f1: 壱 (CJK UNIFIED IDEOGRAPH-58F1)
    58f9: 壹 (CJK UNIFIED IDEOGRAPH-58F9)
    5e7a: 幺 (CJK UNIFIED IDEOGRAPH-5E7A)
    5efe: 廾 (CJK UNIFIED IDEOGRAPH-5EFE)
    5eff: 廿 (CJK UNIFIED IDEOGRAPH-5EFF)
    5f0c: 弌 (CJK UNIFIED IDEOGRAPH-5F0C)
    5f0d: 弍 (CJK UNIFIED IDEOGRAPH-5F0D)
    5f0e: 弎 (CJK UNIFIED IDEOGRAPH-5F0E)
    5f10: 弐 (CJK UNIFIED IDEOGRAPH-5F10)
    62fe: 拾 (CJK UNIFIED IDEOGRAPH-62FE)
    634c: 捌 (CJK UNIFIED IDEOGRAPH-634C)
    67d2: 柒 (CJK UNIFIED IDEOGRAPH-67D2)
    6f06: 漆 (CJK UNIFIED IDEOGRAPH-6F06)
    7396: 玖 (CJK UNIFIED IDEOGRAPH-7396)
    767e: 百 (CJK UNIFIED IDEOGRAPH-767E)
    8086: 肆 (CJK UNIFIED IDEOGRAPH-8086)
    842c: 萬 (CJK UNIFIED IDEOGRAPH-842C)
    8cae: 貮 (CJK UNIFIED IDEOGRAPH-8CAE)
    8cb3: 貳 (CJK UNIFIED IDEOGRAPH-8CB3)
    8d30: 贰 (CJK UNIFIED IDEOGRAPH-8D30)
    9621: 阡 (CJK UNIFIED IDEOGRAPH-9621)
    9646: 陆 (CJK UNIFIED IDEOGRAPH-9646)
    964c: 陌 (CJK UNIFIED IDEOGRAPH-964C)
    9678: 陸 (CJK UNIFIED IDEOGRAPH-9678)
    96f6: 零 (CJK UNIFIED IDEOGRAPH-96F6)
    a6e6: ꛦ (BAMUM LETTER MO)
    a6e7: ꛧ (BAMUM LETTER MBAA)
    a6e8: ꛨ (BAMUM LETTER TET)
    a6e9: ꛩ (BAMUM LETTER KPA)
    a6ea: ꛪ (BAMUM LETTER TEN)
    a6eb: ꛫ (BAMUM LETTER NTUU)
    a6ec: ꛬ (BAMUM LETTER SAMBA)
    a6ed: ꛭ (BAMUM LETTER FAAMAE)
    a6ee: ꛮ (BAMUM LETTER KOVUU)
    a6ef: ꛯ (BAMUM LETTER KOGHOM)
    a830: ꠰ (NORTH INDIC FRACTION ONE QUARTER)
    a831: ꠱ (NORTH INDIC FRACTION ONE HALF)
    a832: ꠲ (NORTH INDIC FRACTION THREE QUARTERS)
    a833: ꠳ (NORTH INDIC FRACTION ONE SIXTEENTH)
    a834: ꠴ (NORTH INDIC FRACTION ONE EIGHTH)
    a835: ꠵ (NORTH INDIC FRACTION THREE SIXTEENTHS)
    f96b: 參 (CJK COMPATIBILITY IDEOGRAPH-F96B)
    f973: 拾 (CJK COMPATIBILITY IDEOGRAPH-F973)
    f978: 兩 (CJK COMPATIBILITY IDEOGRAPH-F978)
    f9b2: 零 (CJK COMPATIBILITY IDEOGRAPH-F9B2)
    f9d1: 六 (CJK COMPATIBILITY IDEOGRAPH-F9D1)
    f9d3: 陸 (CJK COMPATIBILITY IDEOGRAPH-F9D3)
    f9fd: 什 (CJK COMPATIBILITY IDEOGRAPH-F9FD)
    

    However, the distinction between Numeric_Type=Digit and Numeric_Type=Numeric is no longer considered useful, and Numeric_Type=Digit is no longer used for new characters since Unicode 6.3.0. Quoting Unicode Standard Annex #44:

    Starting with Unicode 6.3.0, no newly encoded numeric characters will be given Numeric_Type=Digit, nor will existing characters with Numeric_Type=Numeric be changed to Numeric_Type=Digit. The distinction between those two types is not considered useful.

    Thus,

    0 讨论(0)
  • 2020-12-31 01:27

    From python inbuilt docs,

    >>> unicode.isdigit.__doc__
    'S.isdigit() -> bool\n\nReturn True if all characters in S are digits\nand there is at least one character in S, False otherwise.'
    >>> unicode.isnumeric.__doc__
    'S.isnumeric() -> bool\n\nReturn True if there are only numeric characters in S,\nFalse otherwise.'
    
    0 讨论(0)
  • 2020-12-31 01:36

    From the manual

    The method isnumeric() checks whether the string consists of only numeric characters. This method is present only on unicode objects.

    Digits include decimal characters and digits that need special handling, such as the compatibility superscript digits. Formally, a digit is a character that has the property value Numeric_Type=Digit or Numeric_Type=Decimal.

    0 讨论(0)
提交回复
热议问题