Python regex uppercase unicode word

后端 未结 1 559
臣服心动
臣服心动 2021-01-05 23:29

I need to find abbreviations text in many languages. Current regex is:

import regex as re
pattern = re.compile(\'(?:[\\w]\\.)+\', re.UNICODE | re.MULTILINE |         


        
相关标签:
1条回答
  • 2021-01-05 23:56

    You need to use a Unicode character property in order to match them. re does not support character properties, but regex does.

    >>> regex.findall(ur'\p{Lu}', u'ÜìÑ')
    [u'\xdc', u'\xd1']
    
    0 讨论(0)
提交回复
热议问题