character-properties

Match any unicode letter?

♀尐吖头ヾ 提交于 2019-11-26 12:46:14
问题 In .net you can use \\p{L} to match any letter, how can I do the same in Python? Namely, I want to match any uppercase, lowercase, and accented letters. 回答1: Python's re module doesn't support Unicode properties yet. But you can compile your regex using the re.UNICODE flag, and then the character class shorthand \w will match Unicode letters, too. Since \w will also match digits, you need to then subtract those from your character class, along with the underscore: [^\W\d_] will match any

Regex for names with special characters (Unicode)

╄→гoц情女王★ 提交于 2019-11-26 11:03:51
问题 Okay, I have read about regex all day now, and still don\'t understand it properly. What i\'m trying to do is validate a name, but the functions i can find for this on the internet only use [a-zA-Z] , leaving characters out that i need to accept to. I basically need a regex that checks that the name is at least two words, and that it does not contain numbers or special characters like !\"#¤%&/()=... , however the words can contain characters like æ, é, Â and so on... An example of an accepted

matching unicode characters in python regular expressions

半城伤御伤魂 提交于 2019-11-26 09:39:32
问题 I have read thru the other questions at Stackoverflow, but still no closer. Sorry, if this is allready answered, but I didn`t get anything proposed there to work. >>> import re >>> m = re.match(r\'^/by_tag/(?P<tag>\\w+)/(?P<filename>(\\w|[.,!#%{}()@])+)$\', \'/by_tag/xmas/xmas1.jpg\') >>> print m.groupdict() {\'tag\': \'xmas\', \'filename\': \'xmas1.jpg\'} All is well, then I try something with Norwegian characters in it ( or something more unicode-like ): >>> m = re.match(r\'^/by_tag/(?P<tag

How to match Cyrillic characters with a regular expression

半世苍凉 提交于 2019-11-26 02:57:40
问题 How do I match French and Russian Cyrillic alphabet characters with a regular expression? I only want to do the alpha characters, no numbers or special characters. Right now I have [A-Za-z] 回答1: It depends on your regex flavor. If it supports Unicode character classes (like .NET, for instance), \p{L} matches a letter character (in any character set). 回答2: If your regex flavor supports Unicode blocks ( [\p{IsCyrillic}] ), you can match Russian ( Cyrillic ) characters with: [\p{IsCyrillic}] or

Python regex matching Unicode properties

眉间皱痕 提交于 2019-11-26 00:44:19
问题 Perl and some other current regex engines support Unicode properties, such as the category, in a regex. E.g. in Perl you can use \\p{Ll} to match an arbitrary lower-case letter, or p{Zs} for any space separator. I don\'t see support for this in either the 2.x nor 3.x lines of Python (with due regrets). Is anybody aware of a good strategy to get a similar effect? Homegrown solutions are welcome. 回答1: Have you tried Ponyguruma, a Python binding to the Oniguruma regular expression engine? In

Unicode equivalents for \w and \b in Java regular expressions?

China☆狼群 提交于 2019-11-26 00:06:22
问题 Many modern regex implementations interpret the \\w character class shorthand as \"any letter, digit, or connecting punctuation\" (usually: underscore). That way, a regex like \\w+ matches words like hello , élève , GOÄ_432 or gefräßig . Unfortunately, Java doesn\'t. In Java, \\w is limited to [A-Za-z0-9_] . This makes matching words like those mentioned above difficult, among other problems. It also appears that the \\b word separator matches in places where it shouldn\'t. What would be the

Matching Unicode letter characters in PCRE/PHP

微笑、不失礼 提交于 2019-11-25 23:59:14
问题 I\'m trying to write a reasonably permissive validator for names in PHP, and my first attempt consists of the following pattern: // unicode letters, apostrophe, hyphen, space $namePattern = \"/^([\\\\p{L}\'\\\\- ])+$/\"; This is eventually passed to a call to preg_match() . As far as I can tell, this works with your vanilla ASCII alphabet, but seems to trip up on spicier characters like Ă or 张. Is there something wrong with the pattern itself? Perhaps I\'m expecting \\p{L} to do more work