I have a Java program which is supposed to remove all non-letter characters from a string, except when they are a smiley face such as =) or =] or :P
It's very easy to match the opposite with [a-zA-Z ]|=\)|=\]|:P
but I cannot figure out how to negate this expression. Since I am using the String.replaceAll() function it must be in the negated form.
I believe part of the issue may come from the fact that smiles are generally 2 characters long, and I am only matching 1 character at a time?
Interestingly, replaceAll("(?![Tt])[Oo]","")
removes every occurrence of the letter O, even in the word "to." Does this mean my replaceAll function does not understand regex lookahead? It doesn't throw any errors...
I ended up using
replaceAll("(?<![=:;])[\\]\\[\\(\\)\\/]","")
.replaceAll("[=:;](?![\\]\\[\\(\\)o0OpPxX\\/])","")
.replaceAll("[^a-zA-Z=:;\\(\\)\\[\\]\\/ ]","")
which is extremely messy but works perfectly. The... quick! (brown) fox jump's over the[] lazy dog. :] =O ;X
becomes THE QUICK BROWN FOX JUMPS OVER THE LAZY DOG :] =O ;X
Edit: Ignore that fix, see the accepted answer below.
It should be pretty easy to due this using a negative lookahead. Basically the match will fail at any position where the regex inside of the (?!...)
group matches. You should follow the negative lookahead with a single wildcard (.
) to consume a character if the lookahead did not match (meaning that the next character is a non-letter character that is not part of a smiley face).
edit: Clearly I hadn't tested my original regex very thoroughly, you also need a negative lookbehind following the .
to make sure that the character you consumed was not the second character in a smiley:
(?![a-zA-Z ]|=\)|=\]|:P).(?<!=\)|=\]|:P)
Note that you might be able to shorten the regex by using character classes for the eyes and the mouth, for example:
[:=][\(\)\[\]]
^ ^-----mouth
|--eyes
来源:https://stackoverflow.com/questions/7465593/using-regex-to-match-non-word-characters-but-not-smiley-faces