Regex - nested patterns - within outer pattern but exclude inner pattern

后端 未结 5 1914
野的像风
野的像风 2020-12-11 12:41

I have a file with the content below.

 ${ dontReplaceMe } ReplaceMe ${dontReplaceMeEither} 

I want to match \'Replac

相关标签:
5条回答
  • 2020-12-11 12:56

    Something like <td>.*(?<!${).*ReplaceMe(?!.*}).*</td> should work, if grep supports negative lookbehinds (I don't remember if it does).

    0 讨论(0)
  • 2020-12-11 12:56
    sed -i 's/<td>\sReplaceMe\s<\/td>/<td>Replaced<\/td>/gi' input.file
    

    worked for me.

    you may consider using -i.bak to backup the old file, in case of a mistake.

    alternatively,

    perl -pi -e 's/<td>\sReplaceMe\s<\/td>/<td>Replaced<\/td>/g' temp

    also works, again, note the -pi.bak to backup.

    0 讨论(0)
  • 2020-12-11 13:13

    Well, for such simple case, you just need to verify that the line does not match ${.*}:

    $ sed '/\${.*}/!s/ReplaceMe/REPLACED/' input
    <td> REPLACED </td>
    <td> ${ don't ReplaceMe } </td>
    

    The ! after the /\${.*}/ sed address negates the criteria.

    OTOH, if the case is not that so simple, I'd suspect that your problem will grow a lot and regex will not be the best solution.

    0 讨论(0)
  • 2020-12-11 13:13

    usually it is a bad idea to use regex when there is structured markup involved. in some special cases it might be ok, but there are better tools to parse html and then you can use regex on the text nodes.

    0 讨论(0)
  • 2020-12-11 13:17

    This is not possible.

    Regex can be used for Type-3 Chomsky languages (regular language).
    Your sample code however is a Type-2 Chomsky language (context-free language).

    Pretty much as soon as any kind of nesting (brackets) is involved you're dealing with context free languages, which are not covered by regular expressions.

    There is basically no way to define within a pair of x and y in a regular expression, as this would require the regular expression to have some kind of stack, which it doesn't (being functionally equivalent to a finite state automaton).


    Challenged by brandizzi to find a regex that might match at least trivial cases
    I actually came up with this (painfully hacky) regex pattern:

    perl -pe 's/(?<=<td>)((?:(?:\{.*?\})*[^{]*?)*)(ReplaceMe)(.*)(?=<\/td>)/$1REPLACED$3/g'
    

    It does proper (sic!) matching for these cases:

    <td> ${ dontReplaceMe } ReplaceMe ${dontReplaceMeEither} </td>
    <td> ReplaceMe ${dontReplaceMeEither} </td>
    <td> ${ dontReplaceMe } ReplaceMe </td>
    <td> ReplaceMe </td>
    

    And fails with this one (nesting is Chomsky Type-2, remember? ;) ):

    <td>${ ${ dontReplaceMe } ReplaceMe ${dontReplaceMeEither} }</td>
    

    And it can't replace multiple matches either:

    <td> ReplaceMe ReplaceMe </td>
    <td> ReplaceMe ${dontReplaceMeEither} ReplaceMe </td>
    

    Getting the leading $ covered was the tricky part.
    This and keeping Reginald/Reggy from crashing constantly while writing this beast.

    AGAIN: EXPERIMENTAL, DO NOT EVER USE THIS IN PRODUCTION CODE!

    (…or I'll hunt you down, should I ever have to work with your code/app ;)

    0 讨论(0)
提交回复
热议问题