Use regex to find specific string not in html tag

后端 未结 8 1094
温柔的废话
温柔的废话 2021-02-05 08:53

I\'m having some difficulty with a specific Regex I\'m trying to use. I\'m searching for every occurrence of a string (for my purposes, I\'ll say it\'s \"mystring\") i

相关标签:
8条回答
  • 2021-02-05 09:04

    Another regex to search that worked for me

    (?![^<]*>)_mystring_
    

    Source: https://stackoverflow.com/a/857819/1106878

    0 讨论(0)
  • 2021-02-05 09:04

    Why use regex?

    For xhtml, load it into XDocument / XmlDocument; for (non-x)html the Html Agility Pack would seem a more sensible choice...

    Either way, that will parse the html into a DOM so you can iterate over the nodes and inspect them.

    0 讨论(0)
  • 2021-02-05 09:06

    Ignoring that are there indeed other ways, and that I'm no real regex expert, but one thing that popped into my head was:

    • find all the mystrings that ARE in tags first - because I can't write the expression to do the opposite :)
    • change those to something else
    • then replace all the other mystring (that are left not in tags) as you need
    • restore the original mystrings that were in tags

    So, using <[^>]*?(mystring)[^>]*> you can find the tagged ones. Replace those with otherstring. Do you normal replace on the mystrings that are left. Replace otherstring back to mystring

    Crude but effective....maybe.

    0 讨论(0)
  • 2021-02-05 09:07

    When your regex processor doesn't support variable length look behind, try this:

    (<.+?>[^<>]*?)(_mystring_)([^<>]*?<.+?>)
    

    Preserve capture groups 1 and 3 and replace capture group 2:

    For example, in Eclipse, find:

    (<.+?>[^<>]*?)(_mystring_)([^<>]*?<.+?>)
    

    and replace with:

    $1_newString_$3
    

    (Other regex processors might use a different capture group syntax, such as \1)

    0 讨论(0)
  • 2021-02-05 09:09

    This should do it:

    (?<!<[^>]*)_mystring_
    

    It uses a negative look behind to check that the matched string does not have a < before it without a corresponding >

    0 讨论(0)
  • 2021-02-05 09:18

    A quick and dirty alternative is to use a regex replace function with callback to encode the content of tags (everything between < and >), for example using base64, then run your search, then run another callback to decode your tag contents.

    This can also save a lot of head scratching when you need to exclude specific tags from a regex search - first obfuscate them and wrap them in a marker that won't match your search, then run your search, then deobfuscate whatever is in markers.

    0 讨论(0)
提交回复
热议问题