List of unicode character names

筅森魡賤 提交于 2021-02-16 16:30:19

问题


In Python I can print a unicode character by name (e.g. print(u'\N{snowman}')). Is there a way I get get a list of all valid names?


回答1:


Every codepoint has a name, so you are effectively asking for the Unicode standard list of codepoint names (as well as the *list of name aliases, supported by Python 3.3 and up).

Each Python version supports a specific version of the Unicode standard; the unicodedata.unidata_version attribute tells you which one for a given Python runtime. The above links lead to the latest published Unicode version, replace UCD/latest in the URLs with the value of unicodedata.unidata_version for your Python version.

Per codepoint, the unicodedata.name() function can tell you the official name, and unicodedata.lookup() gives you the inverse (name to codepoint).




回答2:


If you want a list of all unicode character names, consider downloading the Unicode Character Database.

It is included in the base repositories of many linux distributions (ex. "unicode-ucd" on RHEL).

The package includes NamesList.txt, which contains the exhaustive list of unicode character names.

Caution: NamesList.txt need some times to be downloaded (size > 1.5 MB).

Example:

21FE    RIGHTWARDS OPEN-HEADED ARROW
21FF    LEFT RIGHT OPEN-HEADED ARROW
@@  2200    Mathematical Operators  22FF
@@+
@       Miscellaneous mathematical symbols
2200    FOR ALL
    = universal quantifier
2201    COMPLEMENT
    x (latin letter stretched c - 0297)
2202    PARTIAL DIFFERENTIAL
2203    THERE EXISTS
    = existential quantifier
2204    THERE DOES NOT EXIST
    : 2203 0338
2205    EMPTY SET
    = null set
    * used in linguistics to indicate a null morpheme or phonological "zero"
    x (latin capital letter o with stroke - 00D8)
    x (diameter sign - 2300)
    ~ 2205 FE00 zero with long diagonal stroke overlay form



回答3:


Yes there is a way. Going through all existing code points and calling unicodedata.name() on each of them. Like this:

names = []
for c in range(0, 0x10FFFF + 1):
    try:
        names.append(unicodedata.name(c))
    except KeyError:
        pass
# Do something with names



回答4:


For a given codepoint, you can use unicodedata.name. To get them all, you can work through all the billions to see which have such names.




回答5:


If you want to insert a unicode character by name, but don't know the name. Here is how you get an easy overview of unicode character names.

On Windows

  1. Open "Character Map" (search for charmap.exe and run it).
  2. Select any common Microsoft font (these tend to have a wide variety of unicode characters defined).
  3. Click on any character on the map to get its Unicode Character Name.

On Mac it's called "Character Palette" and found under System Preferences, "International -> Input" or "Language & Text -> Input Sources" by ticking the box next to "Character Palette".




回答6:


Just print them all:

import unicodedata 

for i in range(0x110000): 
    character = chr(i) 
    name = unicodedata.name(character, "") 
    if len(name) > 0: 
        print(f"{i:6} | 0x{i:04X} | {character} | {name}") 



回答7:


my one liner, just for my own reference ;p

import unicodedata
names = [unicodedata.name(chr(c)) for c in range(0, 0x10FFFF+1) if unicodedata.name(chr(c), None)]


来源:https://stackoverflow.com/questions/30302766/list-of-unicode-character-names

标签
易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!