Using regex to match non-word characters BUT NOT smiley faces

生来就可爱ヽ(ⅴ<●) 提交于 2020-01-13 17:04:40

问题


I have a Java program which is supposed to remove all non-letter characters from a string, except when they are a smiley face such as =) or =] or :P

It's very easy to match the opposite with [a-zA-Z ]|=\)|=\]|:P but I cannot figure out how to negate this expression. Since I am using the String.replaceAll() function it must be in the negated form.

I believe part of the issue may come from the fact that smiles are generally 2 characters long, and I am only matching 1 character at a time?

Interestingly, replaceAll("(?![Tt])[Oo]","") removes every occurrence of the letter O, even in the word "to." Does this mean my replaceAll function does not understand regex lookahead? It doesn't throw any errors...

I ended up using

replaceAll("(?<![=:;])[\\]\\[\\(\\)\\/]","")
.replaceAll("[=:;](?![\\]\\[\\(\\)o0OpPxX\\/])","")
.replaceAll("[^a-zA-Z=:;\\(\\)\\[\\]\\/ ]","")

which is extremely messy but works perfectly. The... quick! (brown) fox jump's over the[] lazy dog. :] =O ;X becomes THE QUICK BROWN FOX JUMPS OVER THE LAZY DOG :] =O ;X

Edit: Ignore that fix, see the accepted answer below.


回答1:


It should be pretty easy to due this using a negative lookahead. Basically the match will fail at any position where the regex inside of the (?!...) group matches. You should follow the negative lookahead with a single wildcard (.) to consume a character if the lookahead did not match (meaning that the next character is a non-letter character that is not part of a smiley face).

edit: Clearly I hadn't tested my original regex very thoroughly, you also need a negative lookbehind following the . to make sure that the character you consumed was not the second character in a smiley:

(?![a-zA-Z ]|=\)|=\]|:P).(?<!=\)|=\]|:P)

Note that you might be able to shorten the regex by using character classes for the eyes and the mouth, for example:

[:=][\(\)\[\]]
  ^    ^-----mouth
  |--eyes


来源:https://stackoverflow.com/questions/7465593/using-regex-to-match-non-word-characters-but-not-smiley-faces

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!