Do LATIN CAPITAL LETTER I (U+0049) and ROMAN NUMERAL ONE (U+2160) have unicode compatibility equivalence?

前端 未结 3 1795
清酒与你
清酒与你 2021-02-06 03:42

Unicode defines two kinds of equivalence 000 canonical equivalence and compatibility equivalence. The example in Unicode Technical Annex #15 for compatibility equivalence is SUP

3条回答
  •  庸人自扰
    2021-02-06 04:20

    ᴇᴅɪᴛ: Added exactly what the original question is looking for at the bottom. This is really cool.


    The answer to your question about ʀᴏᴍᴀɴ ɴᴜᴍᴇʀᴀʟ ᴏɴᴇ and ʟᴀᴛɪɴ ᴄᴀᴘɪᴛᴀʟ ʟᴇᴛᴛᴇʀ ɪ is YES. Here’s a quick way to check:

    $ perl -Mcharnames=:full -MUnicode::Normalize -le 'print
       NFKD "\N{ROMAN NUMERAL ONE}"  eq  NFKD "\N{LATIN CAPITAL LETTER I}"'
    1
    

    However, the answer to your question as to whether characters that are visually indistinguishable have compatibility equivalence is most definitely NO!

    For example, ᴄʜᴇʀᴏᴋᴇᴇ ʟᴇᴛᴛᴇʀ ɢᴏ (Ꭺ) looks like ʟᴀᴛɪɴ ᴄᴀᴘɪᴛᴀʟ ʟᴇᴛᴛᴇʀ ᴀ (A), but is certainly not NFKD equivalent. Similarly with ɢʀᴇᴇᴋ ᴄᴀᴘɪᴛᴀʟ ʟᴇᴛᴛᴇʀ ᴀʟᴘʜᴀ (Α) and ᴄʏʀɪʟʟɪᴄ ᴄᴀᴘɪᴛᴀʟ ʟᴇᴛᴛᴇʀ ᴀ (А) not being NFKD equivalent. There are effectively uncountably many (well, I can’t count them :) such issues. The only code points that are NFKD-equiv to ʟᴀᴛɪɴ ᴄᴀᴘɪᴛᴀʟ ʟᴇᴛᴛᴇʀ ᴀ, for example, are:

    U+00041 ‭ A  GC=Lu SC=Latin        LATIN CAPITAL LETTER A
    U+01D2C ‭ ᴬ  GC=Lm SC=Latin        MODIFIER LETTER CAPITAL A
    U+024B6 ‭ Ⓐ  GC=So SC=Common       CIRCLED LATIN CAPITAL LETTER A
    U+0FF21 ‭ A GC=Lu SC=Latin        FULLWIDTH LATIN CAPITAL LETTER A
    U+1D400 ‭ 

提交回复
热议问题