问题
In Python I can print a unicode character by name (e.g. print(u'\N{snowman}')
). Is there a way I get get a list of all valid names?
回答1:
Every codepoint has a name, so you are effectively asking for the Unicode standard list of codepoint names (as well as the *list of name aliases, supported by Python 3.3 and up).
Each Python version supports a specific version of the Unicode standard; the unicodedata.unidata_version attribute tells you which one for a given Python runtime. The above links lead to the latest published Unicode version, replace UCD/latest
in the URLs with the value of unicodedata.unidata_version
for your Python version.
Per codepoint, the unicodedata.name() function can tell you the official name, and unicodedata.lookup() gives you the inverse (name to codepoint).
回答2:
If you want a list of all unicode character names, consider downloading the Unicode Character Database.
It is included in the base repositories of many linux distributions (ex. "unicode-ucd" on RHEL).
The package includes NamesList.txt, which contains the exhaustive list of unicode character names.
Caution: NamesList.txt
need some times to be downloaded (size > 1.5 MB).
Example:
21FE RIGHTWARDS OPEN-HEADED ARROW
21FF LEFT RIGHT OPEN-HEADED ARROW
@@ 2200 Mathematical Operators 22FF
@@+
@ Miscellaneous mathematical symbols
2200 FOR ALL
= universal quantifier
2201 COMPLEMENT
x (latin letter stretched c - 0297)
2202 PARTIAL DIFFERENTIAL
2203 THERE EXISTS
= existential quantifier
2204 THERE DOES NOT EXIST
: 2203 0338
2205 EMPTY SET
= null set
* used in linguistics to indicate a null morpheme or phonological "zero"
x (latin capital letter o with stroke - 00D8)
x (diameter sign - 2300)
~ 2205 FE00 zero with long diagonal stroke overlay form
回答3:
Yes there is a way. Going through all existing code points and calling unicodedata.name()
on each of them. Like this:
names = []
for c in range(0, 0x10FFFF + 1):
try:
names.append(unicodedata.name(c))
except KeyError:
pass
# Do something with names
回答4:
For a given codepoint, you can use unicodedata.name
. To get them all, you can work through all the billions to see which have such names.
回答5:
If you want to insert a unicode character by name, but don't know the name. Here is how you get an easy overview of unicode character names.
On Windows
- Open "Character Map" (search for charmap.exe and run it).
- Select any common Microsoft font (these tend to have a wide variety of unicode characters defined).
- Click on any character on the map to get its Unicode Character Name.
On Mac it's called "Character Palette" and found under System Preferences, "International -> Input" or "Language & Text -> Input Sources" by ticking the box next to "Character Palette".
回答6:
Just print them all:
import unicodedata
for i in range(0x110000):
character = chr(i)
name = unicodedata.name(character, "")
if len(name) > 0:
print(f"{i:6} | 0x{i:04X} | {character} | {name}")
回答7:
my one liner, just for my own reference ;p
import unicodedata
names = [unicodedata.name(chr(c)) for c in range(0, 0x10FFFF+1) if unicodedata.name(chr(c), None)]
来源:https://stackoverflow.com/questions/30302766/list-of-unicode-character-names