How to find and replace a particular character but only if it is in quotes?

后端 未结 6 2055
故里飘歌
故里飘歌 2021-02-19 01:29

Problem: I have thousands of documents which contains a specific character I don\'t want. E.g. the character a. These documents contain a variety of characters, but

6条回答
  •  攒了一身酷
    2021-02-19 02:04

    Firstly a few of considerations:

    1. There could be multiple a characters within a single quote.
    2. Each quote (using single or double quotation marks) consists of an opening quote character, some text and the same closing quote character. A simple approach is to assume that when the quote characters are counted sequentially, the odd ones are opening quotes and the even ones are closing quotes.
    3. Following point 2, it could be worth some further thought on whether single-quoted strings should be allowed. See the following example: It's a shame 'this quoted text' isn't quoted. Here, the simple approach would think there were two quoted strings: s a shame and isn. Another: This isn't a quote ...'this is' and 'it's unclear where this quote ends'. I've avoided attempting to tackle these complexities and gone with the simple approach below.

    The bad news is that point 1 presents a bit of a problem, as a capturing group with a wildcard repeat character after it (e.g. (.*)*) will only capture the last captured "thing". But the good news is there's a way of getting around this within certain limits. Many regex engines will allow up to 99 capturing groups (*). So if we can make the assumption that there will be no more than 99 as in each quote (UPDATE ...or even if we can't - see step 3), we can do the following...

    (*) Unfortunately my first port of call, Notepad++ doesn't - it only allows up to 9. Not sure about VS Code. But regex101 (used for the online demos below) does.

    TL;DR - What to do?

    1. Search for: "([^a"]*)a*([^a"]*)a*([^a"]*)a*([^a"]*)a*([^a"]*)a*([^a"]*)a*([^a"]*)a*([^a"]*)a*([^a"]*)a*([^a"]*)a*([^a"]*)a*([^a"]*)a*([^a"]*)a*([^a"]*)a*([^a"]*)a*([^a"]*)a*([^a"]*)a*([^a"]*)a*([^a"]*)a*([^a"]*)a*([^a"]*)a*([^a"]*)a*([^a"]*)a*([^a"]*)a*([^a"]*)a*([^a"]*)a*([^a"]*)a*([^a"]*)a*([^a"]*)a*([^a"]*)a*([^a"]*)a*([^a"]*)a*([^a"]*)a*([^a"]*)a*([^a"]*)a*([^a"]*)a*([^a"]*)a*([^a"]*)a*([^a"]*)a*([^a"]*)a*([^a"]*)a*([^a"]*)a*([^a"]*)a*([^a"]*)a*([^a"]*)a*([^a"]*)a*([^a"]*)a*([^a"]*)a*([^a"]*)a*([^a"]*)a*([^a"]*)a*([^a"]*)a*([^a"]*)a*([^a"]*)a*([^a"]*)a*([^a"]*)a*([^a"]*)a*([^a"]*)a*([^a"]*)a*([^a"]*)a*([^a"]*)a*([^a"]*)a*([^a"]*)a*([^a"]*)a*([^a"]*)a*([^a"]*)a*([^a"]*)a*([^a"]*)a*([^a"]*)a*([^a"]*)a*([^a"]*)a*([^a"]*)a*([^a"]*)a*([^a"]*)a*([^a"]*)a*([^a"]*)a*([^a"]*)a*([^a"]*)a*([^a"]*)a*([^a"]*)a*([^a"]*)a*([^a"]*)a*([^a"]*)a*([^a"]*)a*([^a"]*)a*([^a"]*)a*([^a"]*)a*([^a"]*)a*([^a"]*)a*([^a"]*)a*([^a"]*)a*([^a"]*)a*([^a"]*)a*([^a"]*)a*([^a"]*)a*([^a"]*)a*([^a"]*)a*([^a"]*)a*([^a"]*)a*"
    2. Replace with: "\1\2\3\4\5\6\7\8\9\10\11\12\13\14\15\16\17\18\19\20\21\22\23\24\25\26\27\28\29\30\31\32\33\34\35\36\37\38\39\40\41\42\43\44\45\46\47\48\49\50\51\52\53\54\55\56\57\58\59\60\61\62\63\64\65\66\67\68\69\70\71\72\73\74\75\76\77\78\79\80\81\82\83\84\85\86\87\88\89\90\91\92\93\94\95\96\97\98\99"
    3. (Optionally keep repeating steps the previous two steps if there's a possibility of > 99 such characters in a single quote until they've all been replaced).
    4. Repeat step 1 but replacing all " with ' in the regular expression, i.e: '([^a']*)a*([^a']*)a*([^a']*)a*([^a']*)a*([^a']*)a*([^a']*)a*([^a']*)a*([^a']*)a*([^a']*)a*([^a']*)a*([^a']*)a*([^a']*)a*([^a']*)a*([^a']*)a*([^a']*)a*([^a']*)a*([^a']*)a*([^a']*)a*([^a']*)a*([^a']*)a*([^a']*)a*([^a']*)a*([^a']*)a*([^a']*)a*([^a']*)a*([^a']*)a*([^a']*)a*([^a']*)a*([^a']*)a*([^a']*)a*([^a']*)a*([^a']*)a*([^a']*)a*([^a']*)a*([^a']*)a*([^a']*)a*([^a']*)a*([^a']*)a*([^a']*)a*([^a']*)a*([^a']*)a*([^a']*)a*([^a']*)a*([^a']*)a*([^a']*)a*([^a']*)a*([^a']*)a*([^a']*)a*([^a']*)a*([^a']*)a*([^a']*)a*([^a']*)a*([^a']*)a*([^a']*)a*([^a']*)a*([^a']*)a*([^a']*)a*([^a']*)a*([^a']*)a*([^a']*)a*([^a']*)a*([^a']*)a*([^a']*)a*([^a']*)a*([^a']*)a*([^a']*)a*([^a']*)a*([^a']*)a*([^a']*)a*([^a']*)a*([^a']*)a*([^a']*)a*([^a']*)a*([^a']*)a*([^a']*)a*([^a']*)a*([^a']*)a*([^a']*)a*([^a']*)a*([^a']*)a*([^a']*)a*([^a']*)a*([^a']*)a*([^a']*)a*([^a']*)a*([^a']*)a*([^a']*)a*([^a']*)a*([^a']*)a*([^a']*)a*([^a']*)a*([^a']*)a*([^a']*)a*([^a']*)a*([^a']*)a*([^a']*)a*([^a']*)a*([^a']*)a*([^a']*)a*'
    5. Repeat steps 2-3.

    Online demos

    Please see the following regex101 demos, which could actually be used to perform the replacements if you're able to copy the whole text into the contents of "TEST STRING":

    • Demo for double quotes
    • Demo for single quotes.

提交回复
热议问题