Visually-identical characters in Unicode

问题

I want to find visually identical characters for a specific character in Unicode. I know how to find canonical or compatibility decompositions of a character; but they do not give me what I want. I want to find characters that are visually identical (not similar), and their only difference can be their sizes.

for example I want : (s,S), or (S,S) (whose code points are different). I do not want (ß, β), or (e, é).

Any suggestions? Thanks.

回答1:

For a particular character, you could start from annotations in the code charts in the Unicode standard. The annotations often refer to other characters for various reasons, including similarity or identity of shape. But the annotations are not meant to cover everything.

You could also draw your character at http://shapecatcher.com/ and ask it to recognize it. You often get a long list of visually similar alternatives.

As @TedHopp writes in his comment, visual identity is font-dependent. For example, “s” and “S” need not be identical in shape; in most fonts, they are not – the basic form is the same, but there are various differences in stroke width variation, curvature, serifs, etc. However, some characters can be expected to be visually identical in any font that contains them, such as Latin capital A, Greek capital alpha Α, and Cyrillic capital А.

You did not specify the purpose of the study, but you might be doing something that has been carried out to some extent by the Unicode Consortium. See UTR #6, Unicode Security Considerations, which also contains references to related work, including UTS #9, Unicode Security Mechanisms, which contains confusables.txt, Recommended confusable mapping for IDN (i.e., for a particular context, but it may be of interest for other purposes as well).

来源：https://stackoverflow.com/questions/13260890/visually-identical-characters-in-unicode

标签

unicode

similarity