Such as:
str = \'sdf344asfasf天地方益3権sdfsdf\'
Add ()
to Chinese and Japanese Characters:
strAfterConvert = \'sdfasf
If you can't use regex
module that provides access to IsKatakana
, IsHan
properties as shown in @一二三's answer; you could use character ranges from @EvenLisle's answer with stdlib's re
module:
>>> import re
>>> print(re.sub(u"([\u3300-\u33ff\ufe30-\ufe4f\uf900-\ufaff\U0002f800-\U0002fa1f\u30a0-\u30ff\u2e80-\u2eff\u4e00-\u9fff\u3400-\u4dbf\U00020000-\U0002a6df\U0002a700-\U0002b73f\U0002b740-\U0002b81f\U0002b820-\U0002ceaf]+)", r"(\1)", u'sdf344asfasf天地方益3権sdfsdf'))
sdf344asfasf(天地方益)3(権)sdfsdf
Beware of known issues.
You could also check Unicode category:
>>> import unicodedata
>>> unicodedata.category(u'天')
'Lo'
>>> unicodedata.category(u's')
'Ll'