How to find out Chinese or Japanese Character in a String in Python?

前端 未结 4 1032
北海茫月
北海茫月 2021-01-31 05:16

Such as:

str = \'sdf344asfasf天地方益3権sdfsdf\'

Add () to Chinese and Japanese Characters:

strAfterConvert = \'sdfasf         


        
4条回答
  •  梦如初夏
    2021-01-31 05:57

    If you can't use regex module that provides access to IsKatakana, IsHan properties as shown in @一二三's answer; you could use character ranges from @EvenLisle's answer with stdlib's re module:

    >>> import re
    >>> print(re.sub(u"([\u3300-\u33ff\ufe30-\ufe4f\uf900-\ufaff\U0002f800-\U0002fa1f\u30a0-\u30ff\u2e80-\u2eff\u4e00-\u9fff\u3400-\u4dbf\U00020000-\U0002a6df\U0002a700-\U0002b73f\U0002b740-\U0002b81f\U0002b820-\U0002ceaf]+)", r"(\1)", u'sdf344asfasf天地方益3権sdfsdf'))
    sdf344asfasf(天地方益)3(権)sdfsdf
    

    Beware of known issues.

    You could also check Unicode category:

    >>> import unicodedata
    >>> unicodedata.category(u'天')
    'Lo'
    >>> unicodedata.category(u's')
    'Ll'
    

提交回复
热议问题