I am trying to replace two or more occurences of
(like
) tags together with two
You can do that changing a little your regex:
Pattern brTagPattern = Pattern.compile("<\\s*br\\s*/\\s*>\\s*<\\s*br\\s*/\\s*>", Pattern.CASE_INSENSITIVE | Pattern.DOTALL);
This will ignore every spaces between two
. If you just want exactly 2 or three, you can use:
Pattern brTagPattern = Pattern.compile("<\\s*br\\s*/\\s*>(\\s){2,3}<\\s*br\\s*/\\s*>", Pattern.CASE_INSENSITIVE | Pattern.DOTALL);
Here's some Groovy code to test your Pattern:
import java.util.regex.*
Pattern brTagPattern = Pattern.compile( "(<\\s*br\\s*/\\s*>\\s*){2,}", Pattern.CASE_INSENSITIVE | Pattern.DOTALL )
def testData = [
['', ''],
['<br/>', '<br/>'],
['< br/> <br />', '<br/><br/>'],
['<br/> <br/><br/>', '<br/><br/>'],
['<br/> < br/ > <br/>', '<br/><br/>'],
['<br/> <br/> <br/>', '<br/><br/>'],
['<br/><br/><br/> <br/><br/>', '<br/><br/>'],
['<br/><br/><br/><b>w</b><br/>','<br/><br/><b>w</b><br/>'],
]
testData.each { inputStr, expected ->
Matcher matcher = brTagPattern.matcher( inputStr )
assert expected == matcher.replaceAll( '<br/><br/>' )
}
And everything seems to pass fine...
Probably not the answer you want to hear, but it is general wisdom that you should not attempt to parse XML/HTML with regular expressions. So many things can go wrong -- it's a much better idea to use a parsing library specifically meant for such data, which will also completely bypass the issue you're having.
Take a look at JAXB if you are certain your HTML is well-formed XML, or if the HTML is likely to be messy and incompliant (like most real-world HTML) you should try something like TagSoup.