How to find and replace a particular character but only if it is in quotes?

后端 未结 6 2525
刺人心
刺人心 2021-02-19 01:23

Problem: I have thousands of documents which contains a specific character I don\'t want. E.g. the character a. These documents contain a variety of characters, but

6条回答
  •  一生所求
    2021-02-19 02:11

    I am using VSCode, but I'm open to any suggestions.

    If you want to stay in an Editor environment, you could use
    Visual Studio (>= 2012) or even notepad++ for quick fixup.
    This avoids having to use a spurious script environment.

    Both of these engines (Dot-Net and boost, respectively) use the \G construct.
    Which is start the next match at the position where the last one left off.

    Again, this is just a suggestion.

    This regex doesn't check the validity of balanced quotes within the entire
    string ahead of time (but it could with the addition of a single line).

    It is all about knowing where the inside and outside of quotes are.

    I've commented the regex, but if you need more info let me know.
    Again this is just a suggestion (I know your editor uses ECMAScript).

    Find (?s)(?:^([^"]*(?:"[^"a]*(?=")"[^"]*(?="))*"[^"a]*)|(?!^)\G)a([^"a]*(?:(?=a.*?")|(?:"[^"]*$|"[^"]*(?=")(?:"[^"a]*(?=")"[^"]*(?="))*"[^"a]*)))
    Replace $1b$2

    That's all there is to it.

    https://regex101.com/r/loLFYH/1

    Comments

    (?s)                          # Dot-all inine modifier
     (?:
          ^                             # BOS 
          (                             # (1 start), Find first quote from BOS (written back)
               [^"]* 
               (?:                           # --- Cluster
                    " [^"a]*                      # Inside quotes with no 'a'
                    (?= " )
                    " [^"]*                       # Between quotes, get up to next quote
                    (?= " )
               )*                            # --- End cluster, 0 to many times
    
               " [^"a]*                      # Inside quotes, will be an 'a' ahead of here
                                             # to be sucked up by this match           
          )                             # (1 end)
    
       |                              # OR,
    
          (?! ^ )                       # Not-BOS 
          \G                            # Continue where left off from last match.
                                        # Must be an 'a' at this point
     )
     a                             # The 'a' to be replaced
    
     (                             # (2 start), Up to the next 'a' (to be written back)
          [^"a]* 
          (?:                           # --------------------
               (?= a .*? " )                 # If stopped before 'a', must be a quote ahead
            |                              # or,
               (?:                           # --------------------
                    " [^"]* $                     # If stopped at a quote, check for EOS
                 |                              # or, 
                    " [^"]*                       # Between quotes, get up to next quote
                    (?= " )
    
                    (?:                           # --- Cluster
                         " [^"a]*                      # Inside quotes with no 'a'
                         (?= " )
                         " [^"]*                       # Between quotes 
                         (?= " )
                    )*                            # --- End cluster, 0 to many times
    
                    " [^"a]*                      # Inside quotes, will be an 'a' ahead of here
                                                  # to be sucked up on the next match                    
               )                             # --------------------
          )                             # --------------------
     )                             # (2 end)
    

提交回复
热议问题