What are the unicode ranges for Hindi accented characters?

房东的猫 提交于 2019-12-28 06:49:15

问题


I'm trying to gather a Unicode list of all the 'o' like shapes in the Hindi character-set. In fact, a list of any characters (in any language) that makes uses of separate characters to indicate an accent would be better.

I intend to use this unicode-list in a RegExp.

I been trying to edit a list of character-ranges by outputting them in an Input TextField, but editing this text causes weird issues (the keyboard-cursor isn't place on the correct character, selections suddenly dissappear / incorrectly warps... in other words... HINDI HELL!)

I've tried this with Notepad++ too, but although it was more responsive, it eventually crapped out on me like it did in the Flash Player textfield. This seems to occur especially while removing the [] block (nulls?) characters. Some of them trigger odd behaviors.

Anyways, all I want is a list of the accents. An example of a few are in the image below (but I would need ALL accents):

Thanks!


回答1:


You can find pdf's containing lists of unicode ranges, grouped by language, here: http://unicode.org/charts/

For Hindi, you probably want Devanagari or Devanagari Extended.




回答2:


Here is the character class for Devanagari combining marks:

[\u901\u902\u903\u93c\u93e\u93f\u940\u941\u942\u943
 \u944\u945\u946\u947\u948\u949\u94a\u94b\u94c\u94d
 \u951\u952\u953\u954\u962\u963]

This is only the basic Devanagari block (not Devanagari Extended).




回答3:


If you want the complete set (for all languages), you can do it problematically. You start from the Unicode date file at ftp://ftp.unicode.org/Public/6.1.0/ucd/UnicodeData.txt, described by TR-44 (http://unicode.org/reports/tr44/#Property_Definitions)

You can use the Canonical_Combining_Class field (see at http://unicode.org/reports/tr44/#Canonical_Combining_Class_Values) to filter the exact characters you want. Can't be more precise, because "accent" a bit vague :-) You might even have to also look at General_Category to get the filter right (and exclude certain marks, or symbols, or punctuation).

And a script doing this would definitely be better than trying to mess with text editors. One of the characteristics of combining characters is that they combine :-) So you might get all kind of puzzling results (like this: http://www.siao2.com/2006/02/17/533929.aspx :-)



来源:https://stackoverflow.com/questions/9523814/what-are-the-unicode-ranges-for-hindi-accented-characters

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!