Python regex matching Unicode properties

前端未结

关注

 6  1524

我寻月下人不归 2020-11-22 14:49

Perl and some other current regex engines support Unicode properties, such as the category, in a regex. E.g. in Perl you can use \\p{Ll} to match an arbitrary l

6条回答

情歌与酒 (楼主)

2020-11-22 15:12
Speaking of homegrown solutions, some time ago I wrote a small program to do just that - convert a unicode category written as \p{...} into a range of values, extracted from the unicode specification (v.5.0.0). Only categories are supported (ex.: L, Zs), and is restricted to the BMP. I'm posting it here in case someone find it useful (although that Oniguruma really seems a better option).

Example usage:
```
>>> from unicode_hack import regex
>>> pattern = regex(r'^\\p{Lu}(\\p{L}|\\p{N}|_)*')
>>> print pattern.match(u'ÁñÇ_1+2').group(0)
ÁñÇ_1
>>>
```
Here's the source. There is also a JavaScript version, using the same data.
0 讨论(0)

查看其它6个回答
发布评论:

提交评论
- 加载中...