tags
I\'ve stumped myself trying to figure out how to remove carriage returns that occur between tags. (Technically I need to replace them with spaces, not
[\r\n]+(?=(?:[^<]+|<(?!/?p\b))*</p>)
The first part matches one or more of any kind of line separator (\n
, \r\n
, or \r
). The rest is a lookahead that attempts to match everything up to the next closing </p>
tag, but if it finds an opening <p>
tag first, the match fails.
Note that this regex can be fooled very easily, for example by SGML comments, <script>
elements, or plain old malformed HTML. Also, I'm assuming your regex flavor supports positive and negative lookaheads. That's a pretty safe assumption these days, but if the regex doesn't work for you, we'll need to know exactly which language or tool you're using.