Do LATIN CAPITAL LETTER I (U+0049) and ROMAN NUMERAL ONE (U+2160) have unicode compatibility equivalence?

前端 未结 3 1791
清酒与你
清酒与你 2021-02-06 03:42

Unicode defines two kinds of equivalence 000 canonical equivalence and compatibility equivalence. The example in Unicode Technical Annex #15 for compatibility equivalence is SUP

相关标签:
3条回答
  • 2021-02-06 04:20

    ᴇᴅɪᴛ: Added exactly what the original question is looking for at the bottom. This is really cool.


    The answer to your question about ʀᴏᴍᴀɴ ɴᴜᴍᴇʀᴀʟ ᴏɴᴇ and ʟᴀᴛɪɴ ᴄᴀᴘɪᴛᴀʟ ʟᴇᴛᴛᴇʀ ɪ is YES. Here’s a quick way to check:

    $ perl -Mcharnames=:full -MUnicode::Normalize -le 'print
       NFKD "\N{ROMAN NUMERAL ONE}"  eq  NFKD "\N{LATIN CAPITAL LETTER I}"'
    1
    

    However, the answer to your question as to whether characters that are visually indistinguishable have compatibility equivalence is most definitely NO!

    For example, ᴄʜᴇʀᴏᴋᴇᴇ ʟᴇᴛᴛᴇʀ ɢᴏ (Ꭺ) looks like ʟᴀᴛɪɴ ᴄᴀᴘɪᴛᴀʟ ʟᴇᴛᴛᴇʀ ᴀ (A), but is certainly not NFKD equivalent. Similarly with ɢʀᴇᴇᴋ ᴄᴀᴘɪᴛᴀʟ ʟᴇᴛᴛᴇʀ ᴀʟᴘʜᴀ (Α) and ᴄʏʀɪʟʟɪᴄ ᴄᴀᴘɪᴛᴀʟ ʟᴇᴛᴛᴇʀ ᴀ (А) not being NFKD equivalent. There are effectively uncountably many (well, I can’t count them :) such issues. The only code points that are NFKD-equiv to ʟᴀᴛɪɴ ᴄᴀᴘɪᴛᴀʟ ʟᴇᴛᴛᴇʀ ᴀ, for example, are:

    U+00041 ‭ A  GC=Lu SC=Latin        LATIN CAPITAL LETTER A
    U+01D2C ‭ ᴬ  GC=Lm SC=Latin        MODIFIER LETTER CAPITAL A
    U+024B6 ‭ Ⓐ  GC=So SC=Common       CIRCLED LATIN CAPITAL LETTER A
    U+0FF21 ‭ A GC=Lu SC=Latin        FULLWIDTH LATIN CAPITAL LETTER A
    U+1D400 ‭                                                                     
    0 讨论(0)
  • 2021-02-06 04:26

    Yes. Look in UnicodeData.txt:

    2160;ROMAN NUMERAL ONE;Nl;0;L;<compat> 0049;;;1;N;;;;2170;
    
    0 讨论(0)
  • 2021-02-06 04:29

    The answer by @dan04 is the correct answer to the explicit question, but the indirect question “if characters that are visually indistinguishable have compatibility equivalence” has a more complicated answer.

    As a rule, canonically equivalent characters or character sequences are supposed to look similar. They are, roughly speaking, difference presentations of the same intuitive character as encoded characters. This however depends on several practical considerations, and the renderings might in fact be different.

    On the other hand, characters can be visually indistinguishable even though their renderings (glyphs) are identical in every known font. For example, any normal font that contains the capital Latin letter A, the capital Greek letter alpha, and the capital Cyrillic letter A have identical glyphs for them, but they are still completely distinct characters, with no equivalence mapping between them.

    Compatibility equivalent characters may differ in presentation, and they often do, partly because their difference is often stylistic. But they need not differ.

    0 讨论(0)
提交回复
热议问题