javascript regex to select quoted string but not escape quotes

前端 未结 4 694
南旧
南旧 2021-01-19 05:06

Original string:

some text \"some \\\"string\\\"right here \"

Want to get:

\"some \\\"string\\\"right here\"
相关标签:
4条回答
  • 2021-01-19 05:29

    In order to match from quote to quote while ignoring any simple escaped quotes (\"):

    (:?[^\\]|^)(\"(:?.*?[^\\]){0,1}\")
    

    Meaning (:? start of grouping with no extraction [^\\] match one char that is not a backslash | match the previous char or ^ which is beginning of string. ( start of extraction grouping \" find quotes (that follow non slash or start of string), (:?.*?[^\\] match shortest substring ending with none slash, ){0,1} zero times or one - that actually means one time or an empty substring, that is followed by \" a quote mark.

    Edit: Wiktor Stribiżew Correctly pointed out that some more cases with regex terms in the string will fail in my initial answer. for example \\" that should be matched similar to " in your case. To avoid this specific issue you can use

    (:?[^\\]|^)((:?\\\\)*\"(:?.*?[^\\]){0,1}(:?\\\\)*\")
    

    But for actual regex compatibility you will need to refer to Wiktor's answer.

    0 讨论(0)
  • 2021-01-19 05:37

    Parsing the string correctly with a parser

    With a JavaScript regex, it is impossible to start matching at the correct double quote. You will either match an escaped one, or you will fail to match the correct double quote after a literal \ before a quote. Thus, the safest way is to use a parser. Here is a sample one:

    var s = "some text \\\"extras\" some \\\"string \\\" right\" here \"";
    console.log("Incorrect (with regex): ", s.match(/"([^"\\]*(?:\\.[^"\\]*)*)"/g));
    var res = [];
    var tmp = "";
    var in_quotes = false;
    var in_entity = false;
    for (var i=0; i<s.length; i++) {
      if (s[i] === '\\' && in_entity  === false) { 
         in_entity = true;
         if (in_quotes === true) {
           tmp += s[i];
         }
      } else if (in_entity === true) { // add a match
          in_entity = false;
          if (in_quotes === true) {
             tmp += s[i];
          }
      } else if (s[i] === '"' && in_quotes === false) { // start a new match
          in_quotes = true;
          tmp += s[i];
      } else if (s[i] === '"'  && in_quotes === true) { // append char to match and add to results
          tmp += s[i];
          res.push(tmp);
          tmp = "";
          in_quotes = false;
      } else if (in_quotes === true) { // append a char to the match
         tmp += s[i];
      } 
    }
    console.log("Correct results: ", res);

    Not-so-safe regex approach

    It is not possible to match the string you need with lazy dot matching pattern since it will stop before the first ". If you know your string will never have an escaped quote before a quoted substring, and if you are sure there are no literal \ before double quotes (and these conditions are very strict to use the regex safely), you can use

    /"([^"\\]*(?:\\.[^"\\]*)*)"/g
    

    See the regex demo

    • " - match a quote
    • ([^"\\]*(?:\\.[^"\\]*)*) - 0 or more sequences of
      • [^"\\]* - 0+ non-\ and non"s
      • (?:\\.[^"\\]*)* - zero or more sequences of
        • \\. - any escaped symbol
        • [^"\\]* - 0+ non-\ and non"s
    • " - trailing quote

    JS demo:

    var re = /"([^"\\]*(?:\\.[^"\\]*)*)"/g; 
    var str = `some text "some \\"string\\"right here " some text "another \\"string\\"right here "`;
    var res = [];
    while ((m = re.exec(str)) !== null) {
       res.push(m[1]);
    }
    document.body.innerHTML = "<pre>" + JSON.stringify(res, 0, 4) + "</pre>"; // Just for demo
    console.log(res); // or another result demo

    0 讨论(0)
  • 2021-01-19 05:51

    You can use this regex :

    /[^\\](\".*?[^\\]\")/g
    

    [^\\] catch any caracter diferent of \. So \" will not be catch as start or end of your match.

    0 讨论(0)
  • 2021-01-19 05:52

    Safe regex approach

    Complementing @WiktorStribiżew's answer, there is a technique to start matching at the correct double quote using regex. It consists of matching both quoted and unquoted text in the form:

    /"(quoted)"|unquoted/g
    

    As you can see, the quoted text is matched by a group, so we'll only consider text backreferenced by match[1].

    Regex

    /"([^"\\]*(?:\\.[^"\\]*)*)"|[^"\\]*(?:\\.[^"\\]*)*/g
    

    Code

    var regex = /"([^"\\]*(?:\\.[^"\\]*)*)"|[^"\\]*(?:\\.[^"\\]*)*/g;
    var s = "some text \\\"extras\" some \\\"string \\\" right\" here \"";
    var match;
    var res = [];
    
    while ((match = regex.exec(s)) !== null) {
        if (match.index === regex.lastIndex)
            regex.lastIndex++;
    
        if( match[1] != null )
            res.push(match[1]); //Append to result only group 1
    }
    
    console.log("Correct results (regex technique): ",res)

    0 讨论(0)
提交回复
热议问题