character-class

List of metacharacters for MySQL regexp square brackets

陌路散爱 提交于 2019-11-28 09:31:56
问题 Strangely I can't seem to find anywhere a list of the characters that I can't safely use as literals within MySQL regular expression square brackets without escaping them or requiring the use of a [:character_class:] thing. (Also the answer probably needs to be MySQL specific because MySQL regular expressions seem to be lacking compared those in Perl/PHP/Javascript etc). 回答1: Almost all metacharacters (including the dot . , the + , * and ? quantifiers, the end-of-string anchor $ , etc.) have

Matching (e.g.) a Unicode letter with Java regexps

你。 提交于 2019-11-28 09:10:20
There are many questions and answers here on StackOverflow that assume a "letter" can be matched in a regexp by [a-zA-Z] . However with Unicode there are many more characters that most people would regard as a letter (all the Greek letters, Cyrllic .. and many more. Unicode defines many blocks each of which may have "letters". The Java definition defines Posix classes for things like alpha characters, but that is specified to only work with US-ASCII. The predefined character classes define words to consist of [a-zA-Z_0-9] , which also excludes many letters. So how do you properly match against

How can I exclude some characters from a class?

我是研究僧i 提交于 2019-11-27 13:29:11
Say I want to match a "word" character ( \w ), but exclude "_", or match a whitespace character ( \s ), but exclude "\t". How can I do this? Use a negated class including \W or \S. /[^\W_]/ # anything that's not a non-word character and not _ /[^\S\t]/ # anything that's not a non-space character and not \t 来源: https://stackoverflow.com/questions/3548949/how-can-i-exclude-some-characters-from-a-class

Character class subtraction, converting from Java syntax to RegexBuddy

拟墨画扇 提交于 2019-11-27 09:15:34
Which regular expression engine does Java uses? In a tool like RegexBuddy if I use [a-z&&[^bc]] that expression in Java is good but in RegexBuddy it has not been understood. In fact it reports: Match a single character present in the list below [a-z&&[^bc] A character in the range between a and z : a-z One of the characters &[^bc : &&[^bc Match the character ] literally : ] but i want to match a character between a and z intersected with a character that is not b or c polygenelubricants Like most regex flavors, java.util.regex.Pattern has its own specific features with syntax that may not be

Exclude characters from a character class

喜欢而已 提交于 2019-11-27 05:16:27
Is there a simple way to match all characters in a class except a certain set of them? For example if in a lanaguage where I can use \w to match the set of all unicode word characters, is there a way to just exclude a character like an underscore "_" from that match? Only idea that came to mind was to use negative lookahead/behind around each character but that seems more complex than necessary when I effectively just want to match a character against a positive match AND negative match. For example if & was an AND operator I could do this... ^(\w&[^_])+$ Martin Ender It really depends on your

Why is a character class faster than alternation?

故事扮演 提交于 2019-11-27 05:05:47
It seems that using a character class is faster than the alternation in an example like: [abc] vs (a|b|c) I have heard about it being recommended and with a simple test using Time::HiRes I verified it (~10 times slower). Also using (?:a|b|c) in case the capturing parenthesis makes a difference does not change the result. But I can not understand why. I think it is because of backtracking but the way I see it at each position there are 3 character comparison so I am not sure how backtracking hits in affecting the alternation. Is it a result of the implementation's nature of alternation? This is

Matching (e.g.) a Unicode letter with Java regexps

时光毁灭记忆、已成空白 提交于 2019-11-27 02:41:27
问题 There are many questions and answers here on StackOverflow that assume a "letter" can be matched in a regexp by [a-zA-Z] . However with Unicode there are many more characters that most people would regard as a letter (all the Greek letters, Cyrllic .. and many more. Unicode defines many blocks each of which may have "letters". The Java definition defines Posix classes for things like alpha characters, but that is specified to only work with US-ASCII. The predefined character classes define

Regular expression \\p{L} and \\p{N}

徘徊边缘 提交于 2019-11-26 21:38:11
I am new to regular expressions and have been given the following regular expression: (\p{L}|\p{N}|_|-|\.)* I know what * means and | means "or" and that \ escapes. But what I don't know what \p{L} and \p{N} means. I have searched Google for it, without result... Can someone help me? \p{L} matches a single code point in the category "letter". \p{N} matches any kind of numeric character in any script. Source: regular-expressions.info If you're going to work with regular expressions a lot, I'd suggest bookmarking that site, it's very useful. Tim Pietzcker These are Unicode property shortcuts (

How can I exclude some characters from a class?

杀马特。学长 韩版系。学妹 提交于 2019-11-26 16:12:34
问题 Say I want to match a "word" character ( \w ), but exclude "_", or match a whitespace character ( \s ), but exclude "\t". How can I do this? 回答1: Use a negated class including \W or \S. /[^\W_]/ # anything that's not a non-word character and not _ /[^\S\t]/ # anything that's not a non-space character and not \t 来源: https://stackoverflow.com/questions/3548949/how-can-i-exclude-some-characters-from-a-class

Why is a character class faster than alternation?

旧街凉风 提交于 2019-11-26 09:54:35
问题 It seems that using a character class is faster than the alternation in an example like: [abc] vs (a|b|c) I have heard about it being recommended and with a simple test using Time::HiRes I verified it (~10 times slower). Also using (?:a|b|c) in case the capturing parenthesis makes a difference does not change the result. But I can not understand why. I think it is because of backtracking but the way I see it at each position there are 3 character comparison so I am not sure how backtracking