I\'m having some difficulty with a specific Regex I\'m trying to use. I\'m searching for every occurrence of a string (for my purposes, I\'ll say it\'s \"mystring\") i
Another regex to search that worked for me
(?![^<]*>)_mystring_
Source: https://stackoverflow.com/a/857819/1106878
Why use regex?
For xhtml, load it into XDocument / XmlDocument; for (non-x)html the Html Agility Pack would seem a more sensible choice...
Either way, that will parse the html into a DOM so you can iterate over the nodes and inspect them.
Ignoring that are there indeed other ways, and that I'm no real regex expert, but one thing that popped into my head was:
So, using <[^>]*?(mystring)[^>]*>
you can find the tagged ones. Replace those with otherstring. Do you normal replace on the mystrings that are left. Replace otherstring back to mystring
Crude but effective....maybe.
When your regex processor doesn't support variable length look behind, try this:
(<.+?>[^<>]*?)(_mystring_)([^<>]*?<.+?>)
Preserve capture groups 1 and 3 and replace capture group 2:
For example, in Eclipse, find:
(<.+?>[^<>]*?)(_mystring_)([^<>]*?<.+?>)
and replace with:
$1_newString_$3
(Other regex processors might use a different capture group syntax, such as \1)
This should do it:
(?<!<[^>]*)_mystring_
It uses a negative look behind to check that the matched string does not have a < before it without a corresponding >
A quick and dirty alternative is to use a regex replace function with callback to encode the content of tags (everything between < and >), for example using base64, then run your search, then run another callback to decode your tag contents.
This can also save a lot of head scratching when you need to exclude specific tags from a regex search - first obfuscate them and wrap them in a marker that won't match your search, then run your search, then deobfuscate whatever is in markers.