I need to find abbreviations text in many languages. Current regex is:
import regex as re pattern = re.compile(\'(?:[\\w]\\.)+\', re.UNICODE | re.MULTILINE |
You need to use a Unicode character property in order to match them. re does not support character properties, but regex does.
re
>>> regex.findall(ur'\p{Lu}', u'ÜìÑ') [u'\xdc', u'\xd1']