What are the most common non-BMP Unicode characters in actual use? [closed]

匿名 (未验证) 提交于 2019-12-03 08:41:19

问题:

In your experience which Unicode characters, codepoints, ranges outside the BMP (Basic Multilingual Plane) are the most common so far? These are the ones which require 4 bytes in UTF-8 or surrogates in UTF-16.

I would've expected the answer to be Chinese and Japanese characters used in names but not included in the most widespread CJK multibyte character sets, but on the project I do most work on, the English Wiktionary, we have found that the Gothic alphabet is far more common so far.

UPDATE

I've written a couple of software tools to scan entire Wikipedias for non-BMP characters and found to my surprise that even in the Japanese Wikipedia Gothic alphabet is the most common. This is also true in the Chinese Wikipedia but it also had many Chinese characters being used up to 50 or 70 times, including "?", "?", and "?".

回答1:

Emoji are now the most common non-BMP characters by far. ?, otherwise known as U+1F602 FACE WITH TEARS OF JOY, is the most common one on Twitter's public stream. It occurs more frequently than the tilde!



回答2:

Excellent question!

The answer is the mathematical letters. This past December I did a scan of the entire PubMed Open Access corpus, and came up with these figures for astral characters in it.

The first number in the figures below is how many copies of each given code point I found in the entire corpus. First, though, to give you a notion on the relative frequencies, here are the top ten trans-ASCII code points in that corpus:

And here now are the trans-BMP code points, in order of decending frequency:

I really wish I knew what they were using U+100002 to do. :(

If those aren't showing up in your browser, you should install George Douros’s Symbola font. It also has all the fun Unicode 6.0.0 code points in it, too.



回答3:

For me, the Mathematical Alphanumeric Symbols that are used for math typesetting with OpenType fonts such as Cambria Math.



标签
易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!