问题
Is there a default/easy way in Java for split strings, but taking care of quotation marks or other symbols?
For example, given this text:
There's "a man" that live next door 'in my neighborhood', "and he gets me down..."
Obtain:
There's
a man
that
live
next
door
in my neighborhood
and he gets me down
回答1:
Something like this works for your input:
String text = "There's \"a man\" that live next door "
+ "'in my neighborhood', \"and he gets me down...\"";
Scanner sc = new Scanner(text);
Pattern pattern = Pattern.compile(
"\"[^\"]*\"" +
"|'[^']*'" +
"|[A-Za-z']+"
);
String token;
while ((token = sc.findInLine(pattern)) != null) {
System.out.println("[" + token + "]");
}
The above prints (as seen on ideone.com):
[There's]
["a man"]
[that]
[live]
[next]
[door]
['in my neighborhood']
["and he gets me down..."]
It uses Scanner.findInLine, where the regex pattern is one of:
"[^"]*" # double quoted token
'[^']*' # single quoted token
[A-Za-z']+ # everything else
No doubt this doesn't work 100% always; cases where quotes can be nested etc will be tricky.
References
- regular-expressions.info/Character class
回答2:
Doubtful based on your logic, you have differentiation between an apostrophe and single quotes, i.e. There's
and in my neighborhood
You'd have to develop some kind of pairing logic if you wanted what you have above. I'm thinking regular expressions. Or some kind of two part parse.
来源:https://stackoverflow.com/questions/3160564/split-tokenize-scan-a-string-being-aware-of-quotation-marks