Escape all double quotes inside a single quoted string with Regex [duplicate]

不羁岁月 提交于 2019-12-10 09:46:20

问题


Possible Duplicate:
Regular Expression to escape double quotes inside single quotes

I need a regex (no other language!!, best would be perl syntax REGEX or PCRE syntax REGEX) to replace all double quotes " with a \" that are inside a single quoted string. This is an example string (part of a file):

var baseUrl = $("#baseurl").html();
var head = '<div id="finishingDiv" style="background-image:url({baseUrl}css/userAd/images/out_main.jpg); background-repeat: repeat-y; ">'+
'<div id="buttonbar" style="width:810px; text-align:right">';

(Be aware: They dont have to be paired "someValueBetween" so its possible that there are uneven numbers of double quotes in one single quoted string.)

This should be the end result for the last line above:

'<div id=\"buttonbar\" style=\"width:810px; text-align:right\">';

Thanks in advance

***Update: To make it clear, i want a regular expression only, not a perl programm. The regular expression can be perl regex syntax or PHP PCRE syntax (which is a very close syntax to the perl regex syntax from what i understand). Goal is that you can run the regex in IDES in the search and replace menus that support regex's (like Eclipse and PhpEd f.e )!!

In other words, i want a regex that i will put in the search IDE field that gives me exactly all unescaped " in the single quoted string as a result. In the replace field of eclipse i can then just put \$1 to escape them.

They should work in Regexbuddy or regex coach please so i can test them.

At least that is the plan :)



回答1:


You asked for Perl (or PCRE) and nothing else.

Ok.

If you just want to escape unescaped double quotes no matter where you find them, do this:

  s{
      (?<! (?<! \\ ) \\{1} )
      (?<! (?<! \\ ) \\{3} )
      (?<! (?<! \\ ) \\{5} )
      (?<! (?<! \\ ) \\{7} )
      (?= " )
  }{\\}xg;

If you want to escape unescaped double quotes between unescaped single quotes, and you only have one pair of such single quotes, do this:

1 while s{

  (?(DEFINE)

    (?<unescaped>
      (?<! (?<! \\ ) \\{1} )
      (?<! (?<! \\ ) \\{3} )
      (?<! (?<! \\ ) \\{5} )
      (?<! (?<! \\ ) \\{7} )
    )

    (?<single_quote> (?&unescaped) ' )
    (?<double_quote> (?&unescaped) " )
    (?<unquoted>     [^'] *?          )

  )

  (?<HEAD>
    (?&single_quote)
    (?&unquoted)
  )

  (?<TAIL>
    (?&double_quote)
    (?&unquoted)
    (?&single_quote)

  )

}<$+{HEAD}\\$+{TAIL}>xg;

But if you may have multiple sets of paired unescaped single quotes per line, and you only want to escape the unescaped double quotes that fall between those unescaped single quotes, then do this:

sub escape_quote {
  my $_ = shift;
  s{
      (?<! (?<! \\ ) \\{1} )
      (?<! (?<! \\ ) \\{3} )
      (?<! (?<! \\ ) \\{5} )
      (?<! (?<! \\ ) \\{7} )
      (?= " )
  }{\\}xg;

  return $_;
}

s{

  (?(DEFINE)

    (?<unescaped>
      (?<! (?<! \\ ) \\{1} )
      (?<! (?<! \\ ) \\{3} )
      (?<! (?<! \\ ) \\{5} )
      (?<! (?<! \\ ) \\{7} )
    )

    (?<single_quote> (?&unescaped) ' )
    (?<unquoted>     [^'] *?          )

  )

  (?<HEAD> (?&single_quote) )
  (?<TARGET> (?&unquoted) )
  (?<TAIL> (?&single_quote) )

}{
               $+{HEAD}    .
  escape_quote($+{TARGET}) .
               $+{TAIL}

}xeg;

Note that this all presupposed you have no legitimate paired unescaped double quotes containing unescaped single quotes. Even something like this will throw you off:

my $cute = q(') . "stuff" . q(');

Probably, though, you want to use a proper parsing module.

Please pay no attention to all the garish and deceitfully incorrect SO coloring. For some reason, it doesn't seem to be able to parse Perl as well as perl does. Can't imagine why. ☺




回答2:


According to your edit, you want a generic regex to be used in the search-and-replace feature of an unspecified IDE or text editor. It's not that simple. I'm sure you're aware that different languages (Perl, Java, Python, etc.) tend to have their own regex flavors, with different feature sets and syntactic quirks. The situation among editors and IDE's is even worse.

UPDATE: Since I wrote this, Visual Studio has switched to using the .NET flavor, and Notepad++ has adopted the Boost library. The regex below will now work in all the editors/IDE's I mentioned except Visual Studio. (.NET doesn't support possessive quantifiers, but it does have atomic groups, which can be used to the same effect.)

JEdit and IntelliJ IDEA, being written in Java, use Java's regex flavor, which is pretty good. But Visual Studio does not use the excellent .NET flavor; instead it uses a legacy flavor with an eclectic feature set and bizarre syntax. TextMate, the Mac editor that Apple devs rave about, uses the feature-rich Oniguruma flavor, but Notepad++ (a free Windows editor which also gets a lot of good press) use a flavor with an extremely limited feature set--it doesn't even support alternation!

So even relatively simple tasks can be difficult or impossible depending on the editor you're using, but what you're trying to do is pretty tricky. Here's the simplest regex I've come up with for it:

search: \G((?:(?:\A|')[^']*+')?+[^'"]*+)"([^'"]*+)

replace: $1\\"$2

(This assumes every apostrophe is used as a quote; that none of them need to be ignored because they're in comments, double-quoted strings, or whatever; that there are no escaped quotes (single or double) already in the text; and the list goes on.)

The \G (the end-of-previous-match anchor) is essential, but that's a feature that isn't supported even by some of the more popular regex flavors, like JavaScript and Python. Possessive quantifiers (*+, ?+) keep the regex from bogging down when no match is possible; they're available in PCRE, Oniguruma, Perl 5.10+, and Java. .NET doesn't have them, but it does have the somewhat clumsier alternative, atomic groups.

I suggest you forget about the generic-regex approach and standardize on a tool set that has the capabilities you need. For general purposes, I don't think anything beats the JGSoft family of tools: EditPad Pro, PowerGrep, and RegexBuddy. In both features and performance, the JGSoft regex flavor is as good as anything out there; all it lacks are the recursive-matching and embedded-code features.

p.s. I see you mentioned Eclipse in a comment; I don't have it installed, but I expect it uses Java's regex flavor (or possibly the ICU flavor, whose syntax is virtually identical to Java's), so the regex above should work in it.




回答3:


As long as there's only one single-quoted string per line (as in your example), this should work (sed syntax):

s|'\([^'"]*\)"\([^']*\)'|'\1\"\2'|g


来源:https://stackoverflow.com/questions/4073114/escape-all-double-quotes-inside-a-single-quoted-string-with-regex

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!