character-class | 易学教程

List of metacharacters for MySQL regexp square brackets

阅读更多关于 List of metacharacters for MySQL regexp square brackets

问题 Strangely I can't seem to find anywhere a list of the characters that I can't safely use as literals within MySQL regular expression square brackets without escaping them or requiring the use of a [:character_class:] thing. (Also the answer probably needs to be MySQL specific because MySQL regular expressions seem to be lacking compared those in Perl/PHP/Javascript etc). 回答1: Almost all metacharacters (including the dot . , the + , * and ? quantifiers, the end-of-string anchor $ , etc.) have

Matching (e.g.) a Unicode letter with Java regexps

阅读更多关于 Matching (e.g.) a Unicode letter with Java regexps

There are many questions and answers here on StackOverflow that assume a "letter" can be matched in a regexp by [a-zA-Z] . However with Unicode there are many more characters that most people would regard as a letter (all the Greek letters, Cyrllic .. and many more. Unicode defines many blocks each of which may have "letters". The Java definition defines Posix classes for things like alpha characters, but that is specified to only work with US-ASCII. The predefined character classes define words to consist of [a-zA-Z_0-9] , which also excludes many letters. So how do you properly match against

How can I exclude some characters from a class?

阅读更多关于 How can I exclude some characters from a class?

Say I want to match a "word" character ( \w ), but exclude "_", or match a whitespace character ( \s ), but exclude "\t". How can I do this? Use a negated class including \W or \S. /[^\W_]/ # anything that's not a non-word character and not _ /[^\S\t]/ # anything that's not a non-space character and not \t 来源： https://stackoverflow.com/questions/3548949/how-can-i-exclude-some-characters-from-a-class

Character class subtraction, converting from Java syntax to RegexBuddy

阅读更多关于 Character class subtraction, converting from Java syntax to RegexBuddy

Which regular expression engine does Java uses? In a tool like RegexBuddy if I use [a-z&&[^bc]] that expression in Java is good but in RegexBuddy it has not been understood. In fact it reports: Match a single character present in the list below [a-z&&[^bc] A character in the range between a and z : a-z One of the characters &[^bc : &&[^bc Match the character ] literally : ] but i want to match a character between a and z intersected with a character that is not b or c polygenelubricants Like most regex flavors, java.util.regex.Pattern has its own specific features with syntax that may not be

Exclude characters from a character class

阅读更多关于 Exclude characters from a character class

Is there a simple way to match all characters in a class except a certain set of them? For example if in a lanaguage where I can use \w to match the set of all unicode word characters, is there a way to just exclude a character like an underscore "_" from that match? Only idea that came to mind was to use negative lookahead/behind around each character but that seems more complex than necessary when I effectively just want to match a character against a positive match AND negative match. For example if & was an AND operator I could do this... ^(\w&[^_])+$ Martin Ender It really depends on your

Why is a character class faster than alternation?

阅读更多关于 Why is a character class faster than alternation?

It seems that using a character class is faster than the alternation in an example like: [abc] vs (a|b|c) I have heard about it being recommended and with a simple test using Time::HiRes I verified it (~10 times slower). Also using (?:a|b|c) in case the capturing parenthesis makes a difference does not change the result. But I can not understand why. I think it is because of backtracking but the way I see it at each position there are 3 character comparison so I am not sure how backtracking hits in affecting the alternation. Is it a result of the implementation's nature of alternation? This is

Matching (e.g.) a Unicode letter with Java regexps

阅读更多关于 Matching (e.g.) a Unicode letter with Java regexps

问题 There are many questions and answers here on StackOverflow that assume a "letter" can be matched in a regexp by [a-zA-Z] . However with Unicode there are many more characters that most people would regard as a letter (all the Greek letters, Cyrllic .. and many more. Unicode defines many blocks each of which may have "letters". The Java definition defines Posix classes for things like alpha characters, but that is specified to only work with US-ASCII. The predefined character classes define

Regular expression \\p{L} and \\p{N}

阅读更多关于 Regular expression \\p{L} and \\p{N}

I am new to regular expressions and have been given the following regular expression: (\p{L}|\p{N}|_|-|\.)* I know what * means and | means "or" and that \ escapes. But what I don't know what \p{L} and \p{N} means. I have searched Google for it, without result... Can someone help me? \p{L} matches a single code point in the category "letter". \p{N} matches any kind of numeric character in any script. Source: regular-expressions.info If you're going to work with regular expressions a lot, I'd suggest bookmarking that site, it's very useful. Tim Pietzcker These are Unicode property shortcuts (

How can I exclude some characters from a class?

阅读更多关于 How can I exclude some characters from a class?

问题 Say I want to match a "word" character ( \w ), but exclude "_", or match a whitespace character ( \s ), but exclude "\t". How can I do this? 回答1: Use a negated class including \W or \S. /[^\W_]/ # anything that's not a non-word character and not _ /[^\S\t]/ # anything that's not a non-space character and not \t 来源： https://stackoverflow.com/questions/3548949/how-can-i-exclude-some-characters-from-a-class

Why is a character class faster than alternation?

阅读更多关于 Why is a character class faster than alternation?

问题 It seems that using a character class is faster than the alternation in an example like: [abc] vs (a|b|c) I have heard about it being recommended and with a simple test using Time::HiRes I verified it (~10 times slower). Also using (?:a|b|c) in case the capturing parenthesis makes a difference does not change the result. But I can not understand why. I think it is because of backtracking but the way I see it at each position there are 3 character comparison so I am not sure how backtracking