问题
I'm not really sure how to express it but I'm searching for unicode letters which are more than one visual latin letter.
I found this in Word so far:
- DZ
- Dz
- dz
- NJ
- Lj
- LJ
- Nj
- nj
Any others?
回答1:
Sorry about the formatting because it's hard to map long characters to monospace fonts' letter widths. It would be better if it's in a picture but then there's no possibility to copy and zoom infinitely
Digraphs
+-------------+----------+-----------------------+-------------------------+
| Two Glyphs | Digraph | Unicode Code Point | HTML |
+-------------+----------+-----------------------+-------------------------+
| DZ, Dz, dz | DZ, Dz, dz | U+01F1 U+01F2 U+01F3 | DZ Dz dz |
| DŽ, Dž, dž | DŽ, Dž, dž | U+01C4 U+01C5 U+01C6 | DŽ Dž dž |
| IJ, ij | IJ, ij | U+0132 U+0133 | IJ ij |
| LJ, Lj, lj | LJ, Lj, lj | U+01C7 U+01C8 U+01C9 | LJ Lj lj |
| NJ, Nj, nj | NJ, Nj, nj | U+01CA U+01CB U+01CC | NJ Nj nj |
+-------------+----------+-----------------------+-------------------------+
Ligatures
+--------------------+---------------+-----------------+-------------------+
| Non-ligature | Ligature[27] | Unicode | HTML |
+--------------------+---------------+-----------------+-------------------+
| AA, aa | Ꜳ, ꜳ | U+A732, U+A733 | Ꜳ ꜳ |
| AE, ae | Æ, æ | U+00C6, U+00E6 | Æ æ |
| AO, ao | Ꜵ, ꜵ | U+A734, U+A735 | Ꜵ ꜵ |
| AU, au | Ꜷ, ꜷ | U+A736, U+A737 | Ꜷ ꜷ |
| AV, av | Ꜹ, ꜹ | U+A738, U+A739 | Ꜹ ꜹ |
| AV, av (with bar) | Ꜻ, ꜻ | U+A73A, U+A73B | Ꜻ ꜻ |
| AY, ay | Ꜽ, ꜽ | U+A73C, U+A73D | Ꜽ ꜽ |
| et | 🙰 | U+1F670 | 🙰 |
| ff | ff | U+FB00 | ff |
| ffi | ffi | U+FB03 | ffi |
| ffl | ffl | U+FB04 | ffl |
| fi | fi | U+FB01 | fi |
| fl | fl | U+FB02 | fl |
| OE, oe | Œ, œ | U+0152, U+0153 | Œ œ |
| OO, oo | Ꝏ, ꝏ | U+A74E, U+A74F | Ꝏ ꝏ |
| ſs, ſz | ẞ, ß | U+1E9E, U+00DF | ß |
| st | st | U+FB06 | st |
| ſt | ſt | U+FB05 | ſt |
| TZ, tz | Ꜩ, ꜩ | U+A728, U+A729 | Ꜩ ꜩ |
| ue | ᵫ | U+1D6B | ᵫ |
| VY, vy | Ꝡ, ꝡ | U+A760, U+A761 | Ꝡ ꝡ |
+--------------------+---------------+-----------------+-------------------+
There are a few other ligatures that are used for phonetic transcription but looks like Latin characters
+--+---------------+---------------+-----------------+-----------------+
| | Non-ligature | Ligature[27] | Unicode | HTML |
+--+---------------+---------------+-----------------+-----------------+
| | db | ȸ | U+0238 | ȸ |
| | dz | ʣ | U+02A3 | ʣ |
| | IJ, ij | IJ, ij | U+0132, U+0133 | IJ ij |
| | ls | ʪ | U+02AA | ʪ |
| | lz | ʫ | U+02AB | ʫ |
| | qp | ȹ | U+0239 | ȹ |
| | ts | ʦ | U+02A6 | ʦ |
| | ui | ꭐ | U+AB50 | ꭐ |
| | turned ui | ꭑ | U+AB51 | ꭐ |
+--+---------------+---------------+-----------------+-----------------+
https://en.wikipedia.org/wiki/List_of_precomposed_Latin_characters_in_Unicode#Digraphs_and_ligatures
Edit:
There are more letterlike symbols beside ℻ and ℡ like what the OP found in the comment:
℀ ℁ ⅍ ℅ ℆ ℔ ℠ ™
Longer letters are mainly from the CJK Compatibility block
U+338x ㎀ ㎁ ㎂ ㎃ ㎄ ㎅ ㎆ ㎇ ㎈ ㎉ ㎊ ㎋ ㎌ ㎍ ㎎ ㎏
U+339x ㎐ ㎑ ㎒ ㎓ ㎔ ㎕ ㎖ ㎗ ㎘ ㎙ ㎚ ㎛ ㎜ ㎝ ㎞ ㎟
U+33Ax ㎠ ㎡ ㎢ ㎣ ㎤ ㎥ ㎦ ㎧ ㎨ ㎩ ㎪ ㎫ ㎬ ㎭ ㎮ ㎯
U+33Bx ㎰ ㎱ ㎲ ㎳ ㎴ ㎵ ㎶ ㎷ ㎸ ㎹ ㎺ ㎻ ㎼ ㎽ ㎾ ㎿
U+33Cx ㏀ ㏁ ㏂ ㏃ ㏄ ㏅ ㏆ ㏇ ㏈ ㏉ ㏊ ㏋ ㏌ ㏍ ㏎ ㏏
U+33Dx ㏐ ㏑ ㏒ ㏓ ㏔ ㏕ ㏖ ㏗ ㏘ ㏙ ㏚ ㏛ ㏜ ㏝ ㏞ ㏟
Among the 3-letter-like symbols are ㎈ ㎑ ㎒ ㎓ ㎔㏒ ㏕ ㏖ ㏙ ㎪ ㎫ ㎬ ㎭ ㏆ ㏿ ㍱... Probably the ones with most characters are ㎉ and ㎯
Unicode even have codepoints for Roman numerals. Here another 4-letter-like character can be found: Ⅷ
0 1 2 3 4 5 6 7 8 9 A B C D E F
U+215x ⅐ ⅑ ⅒ ⅓ ⅔ ⅕ ⅖ ⅗ ⅘ ⅙ ⅚ ⅛ ⅜ ⅝ ⅞ ⅟
U+216x Ⅰ Ⅱ Ⅲ Ⅳ Ⅴ Ⅵ Ⅶ Ⅷ Ⅸ Ⅹ Ⅺ Ⅻ Ⅼ Ⅽ Ⅾ Ⅿ
U+217x ⅰ ⅱ ⅲ ⅳ ⅴ ⅵ ⅶ ⅷ ⅸ ⅹ ⅺ ⅻ ⅼ ⅽ ⅾ ⅿ
U+218x ↀ ↁ ↂ Ↄ ↄ ↅ ↆ ↇ ↈ ↉ ↊ ↋
If normal numbers can be considered then there are some other codepoints for multiple digits like ⒆ ⒇ ⓳ ⓴ in enclosed alphanumerics
0 1 2 3 4 5 6 7 8 9 A B C D E F
U+246x ① ② ③ ④ ⑤ ⑥ ⑦ ⑧ ⑨ ⑩ ⑪ ⑫ ⑬ ⑭ ⑮ ⑯
U+247x ⑰ ⑱ ⑲ ⑳ ⑴ ⑵ ⑶ ⑷ ⑸ ⑹ ⑺ ⑻ ⑼ ⑽ ⑾ ⑿
U+248x ⒀ ⒁ ⒂ ⒃ ⒄ ⒅ ⒆ ⒇ ⒈ ⒉ ⒊ ⒋ ⒌ ⒍ ⒎ ⒏
U+249x ⒐ ⒑ ⒒ ⒓ ⒔ ⒕ ⒖ ⒗ ⒘ ⒙ ⒚ ⒛ ⒜ ⒝ ⒞ ⒟
U+24Ax ⒠ ⒡ ⒢ ⒣ ⒤ ⒥ ⒦ ⒧ ⒨ ⒩ ⒪ ⒫ ⒬ ⒭ ⒮ ⒯
U+24Bx ⒰ ⒱ ⒲ ⒳ ⒴ ⒵ Ⓐ Ⓑ Ⓒ Ⓓ Ⓔ Ⓕ Ⓖ Ⓗ Ⓘ Ⓙ
U+24Cx Ⓚ Ⓛ Ⓜ Ⓝ Ⓞ Ⓟ Ⓠ Ⓡ Ⓢ Ⓣ Ⓤ Ⓥ Ⓦ Ⓧ Ⓨ Ⓩ
U+24Dx ⓐ ⓑ ⓒ ⓓ ⓔ ⓕ ⓖ ⓗ ⓘ ⓙ ⓚ ⓛ ⓜ ⓝ ⓞ ⓟ
U+24Ex ⓠ ⓡ ⓢ ⓣ ⓤ ⓥ ⓦ ⓧ ⓨ ⓩ ⓪ ⓫ ⓬ ⓭ ⓮ ⓯
U+24Fx ⓰ ⓱ ⓲ ⓳ ⓴ ⓵ ⓶ ⓷ ⓸ ⓹ ⓺ ⓻ ⓼ ⓽ ⓾ ⓿
and in Enclosed Alphanumeric Supplement
🅫, 🅪, 🆋, 🆌, 🆍, 🄭, 🄮, 🅊, 🅋, 🅌, 🅍, 🅎, 🅏
A few more:
Currency symbol group
₧ ₨ ₶ ₯ ₠ ₢ ₷
Miscellaneous technical group
⎂ ⏨
Control pictures (probably you'll need to zoom out to see)
0 1 2 3 4 5 6 7 8 9 A B C D E F
U+240x ␀ ␁ ␂ ␃ ␄ ␅ ␆ ␇ ␈ ␉ ␊ ␋ ␌ ␍ ␎ ␏
U+241x ␐ ␑ ␒ ␓ ␔ ␕ ␖ ␗ ␘ ␙ ␚ ␛ ␜ ␝ ␞ ␟
U+242x ␠ ␡ ␢ ␣  ␥ ␦
Alchemical Symbols
🜀 🜅 🜆 🜇 🜈 🝪 🝫 🝬 🝛 🝜 🝝
Musical Symbols
𝄶 𝄷 𝄸 𝄹 𝄉 𝄊 𝄫
And there are the emojis 🔟 💤🆔🚾🆖🆗🔢🔡🔠 💯🆘🆎🆑™🔙🔚🔜🔝🔛📆🗓🔞
Vertical bars may be considered uppercase i or lowercase L (like your 〷 example which is actually the TELEGRAPH LINE FEED SEPARATOR SYMBOL) and we have
- Vai syllable see ꔖ 0xa516
- Large triple vertical bar operator ⫼ 0x2afc
- Counting rod tens digit three: 𝍫 0x1d36b
- Suzhou numerals 〢 〣
- Chinese river 川
- ║ BOX DRAWINGS DOUBLE VERTICAL...
来源:https://stackoverflow.com/questions/49079499/unicode-letters-with-more-than-1-alphabetic-latin-character