How to find and replace a particular character but only if it is in quotes?

后端 未结 6 2555
刺人心
刺人心 2021-02-19 01:23

Problem: I have thousands of documents which contains a specific character I don\'t want. E.g. the character a. These documents contain a variety of characters, but

6条回答
  •  自闭症患者
    2021-02-19 02:26

    If you can use Visual Studio (instead of Visual Studio Code), it is written in C++ and C# and uses the .NET Framework regular expressions, which means you can use variable length lookbehinds to accomplish this.

    (?<="[^"\n]*)a(?=[^"\n]*")
    

    Adding some more logic to the above regular expression, we can tell it to ignore any locations where there are an even amount of " preceding it. This prevents matches for a outside of quotes. Take, for example, the string "a" a "a". Only the first and last a in this string will be matched, but the one in the middle will be ignored.

    (?

    Now the only problem is this will break if we have escaped " within two double quotes such as "a\"" a "a". We need to add more logic to prevent this behaviour. Luckily, this beautiful answer exists for properly matching escaped ". Adding this logic to the regex above, we get the following:

    (?

    I'm not sure which method works best with your strings, but I'll explain this last regex in detail as it also explains the two previous ones.

    • (? Negative lookbehind ensuring what precedes doesn't match the following
      • ^ Assert position at the start of the line
      • [^"\n]* Match anything except " or \n any number of times
      • (?:(?:"(?:[^"\\\n]|\\.)*){2})+ Match the following one or more times. This ensures if there are any " preceding the match that they are balanced in the sense that there is an opening and closing double quote.
        • (?:"(?:[^"\\\n]|\\.)*){2} Match the following exactly twice
        • " Match this literally
        • (?:[^"\\\n]|\\.)* Match either of the following any number of times
          • [^"\\\n] Match anything except ", \ and \n
          • \\. Matches \ followed by any character
    • (?<="[^"\n]*) Positive lookbehind ensuring what precedes matches the following
      • " Match this literally
      • [^"\n]* Match anything except " or \n any number of times
    • a Match this literally
    • (?=[^"\n]*") Positive lookahead ensuring what follows matches the following
      • [^"\n]* Match anything except " or \n any number of times
      • " Match this literally

    You can drop the \n from the above pattern as the following suggests. I added it just in case there's some sort of special cases I'm not considering (i.e. comments) that could break this regex within your text. The \A also forces the regex to match from the start of the string (or file) instead of the start of the line.

    (?

    You can test this regex here

    This is what it looks like in Visual Studio:

提交回复
热议问题