Split a string by whitespace, keeping quoted segments, allowing escaped quotes

前端 未结 4 1909
南方客
南方客 2020-11-28 09:27

I currently have this regular expression to split strings by all whitespace, unless it\'s in a quoted segment:

keywords = \'pop rock \"hard rock\"\';
keyword         


        
相关标签:
4条回答
  • 2020-11-28 09:42

    ES6 solution supporting:

    • Split by space except for inside quotes
    • Removing quotes but not for backslash escaped quotes
    • Escaped quote become quote
    • Can put quotes anywhere

    Code:

    keywords.match(/\\?.|^$/g).reduce((p, c) => {
            if(c === '"'){
                p.quote ^= 1;
            }else if(!p.quote && c === ' '){
                p.a.push('');
            }else{
                p.a[p.a.length-1] += c.replace(/\\(.)/,"$1");
            }
            return  p;
        }, {a: ['']}).a
    

    Output:

    [ 'pop', 'rock', 'hard rock', '"dream" pop' ]
    
    0 讨论(0)
  • 2020-11-28 09:50

    You can change your regex to:

    keywords = keywords.match(/\w+|"(?:\\"|[^"])+"/g);
    

    Instead of [^"]+ you've got (?:\\"|[^"])+ which allows \" or other character, but not an unescaped quote.

    One important note is that if you want the string to include a literal slash, it should be:

    keywords = 'pop rock "hard rock" "\\"dream\\" pop"'; //note the escaped slashes.
    

    Also, there's a slight inconsistency between \w+ and [^"]+ - for example, it will match the word "ab*d", but not ab*d (without quotes). Consider using [^"\s]+ instead, that will match non-spaces.

    0 讨论(0)
  • 2020-11-28 09:50

    If Kobi's answer works well for the example string, it doesn't when there are more than one successive escape characters (backslashes) between quotes as Tim Pietzcker noticed it in comments. To handle these cases, the pattern can be written like this (for the match method):

    (?=\S)[^"\s]*(?:"[^\\"]*(?:\\[\s\S][^\\"]*)*"[^"\s]*)*
    

    demo

    Where (?=\S) ensures there's at least one non-white-space character at the current position since the following, that describes all allowed sub-strings (including whitespaces between quotes) is totally optional.

    Details:

    (?=\S)   # followed by a non-whitespace
    [^"\s]*  #"# zero or more characters that aren't a quote or a whitespace
    (?: # when a quoted substring occurs:
        "       #"# opening quote
        [^\\"]* #"# zero or more characters that aren't a quote or a backslash
        (?: # when a backslash is encountered:
            \\ [\s\S] # an escaped character (including a quote or a backslash)
            [^\\"]* #"#
        )*
        "         #"# closing quote
        [^"\s]*   #"#
    )*
    
    0 讨论(0)
  • 2020-11-28 09:50

    I would like to point out I had the same regex as you,

    /\w+|"[^"]+"/g
    

    but it didnt worked on empty quoted string such as :

    "" "hello" "" "hi"
    

    so I had to change the + quantifier by *. this gave me :

    str.match(/\w+|"[^"]*"/g);
    

    Which is fine.

    (ex: https://regex101.com/r/wm5puK/1)

    0 讨论(0)
提交回复
热议问题