Original string:
some text \"some \\\"string\\\"right here \"
Want to get:
\"some \\\"string\\\"right here\"
In order to match from quote to quote while ignoring any simple escaped quotes (\"
):
(:?[^\\]|^)(\"(:?.*?[^\\]){0,1}\")
Meaning (:?
start of grouping with no extraction [^\\]
match one char that is not a backslash |
match the previous char or ^
which is beginning of string. (
start of extraction grouping \"
find quotes (that follow non slash or start of string), (:?.*?[^\\]
match shortest substring ending with none slash, ){0,1}
zero times or one - that actually means one time or an empty substring, that is followed by \"
a quote mark.
Edit:
Wiktor Stribiżew Correctly pointed out that some more cases with regex terms in the string will fail in my initial answer. for example \\"
that should be matched similar to "
in your case. To avoid this specific issue you can use
(:?[^\\]|^)((:?\\\\)*\"(:?.*?[^\\]){0,1}(:?\\\\)*\")
But for actual regex compatibility you will need to refer to Wiktor's answer.
With a JavaScript regex, it is impossible to start matching at the correct double quote. You will either match an escaped one, or you will fail to match the correct double quote after a literal \
before a quote. Thus, the safest way is to use a parser. Here is a sample one:
var s = "some text \\\"extras\" some \\\"string \\\" right\" here \"";
console.log("Incorrect (with regex): ", s.match(/"([^"\\]*(?:\\.[^"\\]*)*)"/g));
var res = [];
var tmp = "";
var in_quotes = false;
var in_entity = false;
for (var i=0; i<s.length; i++) {
if (s[i] === '\\' && in_entity === false) {
in_entity = true;
if (in_quotes === true) {
tmp += s[i];
}
} else if (in_entity === true) { // add a match
in_entity = false;
if (in_quotes === true) {
tmp += s[i];
}
} else if (s[i] === '"' && in_quotes === false) { // start a new match
in_quotes = true;
tmp += s[i];
} else if (s[i] === '"' && in_quotes === true) { // append char to match and add to results
tmp += s[i];
res.push(tmp);
tmp = "";
in_quotes = false;
} else if (in_quotes === true) { // append a char to the match
tmp += s[i];
}
}
console.log("Correct results: ", res);
It is not possible to match the string you need with lazy dot matching pattern since it will stop before the first "
. If you know your string will never have an escaped quote before a quoted substring, and if you are sure there are no literal \
before double quotes (and these conditions are very strict to use the regex safely), you can use
/"([^"\\]*(?:\\.[^"\\]*)*)"/g
See the regex demo
"
- match a quote([^"\\]*(?:\\.[^"\\]*)*)
- 0 or more sequences of
[^"\\]*
- 0+ non-\
and non"
s(?:\\.[^"\\]*)*
- zero or more sequences of
\\.
- any escaped symbol[^"\\]*
- 0+ non-\
and non"
s"
- trailing quoteJS demo:
var re = /"([^"\\]*(?:\\.[^"\\]*)*)"/g;
var str = `some text "some \\"string\\"right here " some text "another \\"string\\"right here "`;
var res = [];
while ((m = re.exec(str)) !== null) {
res.push(m[1]);
}
document.body.innerHTML = "<pre>" + JSON.stringify(res, 0, 4) + "</pre>"; // Just for demo
console.log(res); // or another result demo
You can use this regex :
/[^\\](\".*?[^\\]\")/g
[^\\]
catch any caracter diferent of \. So \" will not be catch as start or end of your match.
Complementing @WiktorStribiżew's answer, there is a technique to start matching at the correct double quote using regex. It consists of matching both quoted and unquoted text in the form:
/"(quoted)"|unquoted/g
As you can see, the quoted text is matched by a group, so we'll only consider text backreferenced by match[1]
.
/"([^"\\]*(?:\\.[^"\\]*)*)"|[^"\\]*(?:\\.[^"\\]*)*/g
var regex = /"([^"\\]*(?:\\.[^"\\]*)*)"|[^"\\]*(?:\\.[^"\\]*)*/g;
var s = "some text \\\"extras\" some \\\"string \\\" right\" here \"";
var match;
var res = [];
while ((match = regex.exec(s)) !== null) {
if (match.index === regex.lastIndex)
regex.lastIndex++;
if( match[1] != null )
res.push(match[1]); //Append to result only group 1
}
console.log("Correct results (regex technique): ",res)