I currently have this regular expression to split strings by all whitespace, unless it\'s in a quoted segment:
keywords = \'pop rock \"hard rock\"\';
keyword
ES6 solution supporting:
Code:
keywords.match(/\\?.|^$/g).reduce((p, c) => {
if(c === '"'){
p.quote ^= 1;
}else if(!p.quote && c === ' '){
p.a.push('');
}else{
p.a[p.a.length-1] += c.replace(/\\(.)/,"$1");
}
return p;
}, {a: ['']}).a
Output:
[ 'pop', 'rock', 'hard rock', '"dream" pop' ]
You can change your regex to:
keywords = keywords.match(/\w+|"(?:\\"|[^"])+"/g);
Instead of [^"]+
you've got (?:\\"|[^"])+
which allows \"
or other character, but not an unescaped quote.
One important note is that if you want the string to include a literal slash, it should be:
keywords = 'pop rock "hard rock" "\\"dream\\" pop"'; //note the escaped slashes.
Also, there's a slight inconsistency between \w+
and [^"]+
- for example, it will match the word "ab*d"
, but not ab*d
(without quotes). Consider using [^"\s]+
instead, that will match non-spaces.
If Kobi's answer works well for the example string, it doesn't when there are more than one successive escape characters (backslashes) between quotes as Tim Pietzcker noticed it in comments. To handle these cases, the pattern can be written like this (for the match method):
(?=\S)[^"\s]*(?:"[^\\"]*(?:\\[\s\S][^\\"]*)*"[^"\s]*)*
demo
Where (?=\S)
ensures there's at least one non-white-space character at the current position since the following, that describes all allowed sub-strings (including whitespaces between quotes) is totally optional.
Details:
(?=\S) # followed by a non-whitespace
[^"\s]* #"# zero or more characters that aren't a quote or a whitespace
(?: # when a quoted substring occurs:
" #"# opening quote
[^\\"]* #"# zero or more characters that aren't a quote or a backslash
(?: # when a backslash is encountered:
\\ [\s\S] # an escaped character (including a quote or a backslash)
[^\\"]* #"#
)*
" #"# closing quote
[^"\s]* #"#
)*
I would like to point out I had the same regex as you,
/\w+|"[^"]+"/g
but it didnt worked on empty quoted string such as :
"" "hello" "" "hi"
so I had to change the + quantifier by *. this gave me :
str.match(/\w+|"[^"]*"/g);
Which is fine.
(ex: https://regex101.com/r/wm5puK/1)