Exclude characters from a character class

两盒软妹~` 提交于 2019-11-26 06:47:06

问题


Is there a simple way to match all characters in a class except a certain set of them? For example if in a lanaguage where I can use \\w to match the set of all unicode word characters, is there a way to just exclude a character like an underscore \"_\" from that match?

Only idea that came to mind was to use negative lookahead/behind around each character but that seems more complex than necessary when I effectively just want to match a character against a positive match AND negative match. For example if & was an AND operator I could do this...

^(\\w&[^_])+$

回答1:


It really depends on your regex flavor.

.NET

... provides only one simple character class set operation: subtraction. This is enough for your example, so you can simply use

[\w-[_]]

If a - is followed by a nested character class, it's subtracted. Simple as that...

Java

... provides a much richer set of character class set operations. In particular you can get the intersection of two sets like [[abc]&&[cde]] (which would give c in this case). Intersection and negation together give you subtraction:

[\w&&[^_]]

Perl

... supports set operations on extended character classes as an experimental feature (available since Perl 5.18). In particular, you can directly subtract arbitrary character classes:

(?[ \w - [_] ])

All other flavors

... (that support lookaheads) allow you to mimic the subtraction by using a negative lookahead:

(?!_)\w

This first checks that the next character is not a _ and then matches any \w (which can't be _ due to the negative lookahead).

Note that each of these approaches is completely general in that you can subtract two arbitrarily complex character classes.




回答2:


You can use a negation of the \w class (--> \W) and exclude it:

^([^\W_]+)$



回答3:


A negative lookahead is the correct way to go insofar as I understand your question:

^((?!_)\w)+$



回答4:


Try using subtraction:

[\w&&[^_]]+

Note: This will work in Java, but might not in some other Regex engine.




回答5:


This can be done in python with the regex module. Something like:

import regex as re
pattern = re.compile(r'[\W_--[ ]]+')
cleanString = pattern.sub('', rawString)

You'd typically install the regex module with pip:

pip install regex

EDIT:

The regex module has two behaviours, version 0 and version 1. Set substraction (as above) is a version 1 behaviour. The pypi docs claim version 1 is the default behaviour, but you may find this is not the case. You can check with

import regex
if regex.DEFAULT_VERSION == regex.VERSION1:
  print("version 1")

To set it to version 1:

regex.DEFAULT_VERSION = regex.VERSION1

or to use version one in a single expression:

pattern = re.compile(r'(?V1)[\W_--[ ]]+')


来源:https://stackoverflow.com/questions/17327765/exclude-characters-from-a-character-class

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!