Split/tokenize/scan a string being aware of quotation marks

隐身守侯 提交于 2020-01-14 13:59:14

问题


Is there a default/easy way in Java for split strings, but taking care of quotation marks or other symbols?

For example, given this text:

There's "a man" that live next door 'in my neighborhood', "and he gets me down..."

Obtain:

There's
a man
that
live
next
door
in my neighborhood
and he gets me down

回答1:


Something like this works for your input:

    String text = "There's \"a man\" that live next door "
        + "'in my neighborhood', \"and he gets me down...\"";

    Scanner sc = new Scanner(text);
    Pattern pattern = Pattern.compile(
        "\"[^\"]*\"" +
        "|'[^']*'" +
        "|[A-Za-z']+"
    );
    String token;
    while ((token = sc.findInLine(pattern)) != null) {
        System.out.println("[" + token + "]");
    }

The above prints (as seen on ideone.com):

[There's]
["a man"]
[that]
[live]
[next]
[door]
['in my neighborhood']
["and he gets me down..."]

It uses Scanner.findInLine, where the regex pattern is one of:

"[^"]*"      # double quoted token
'[^']*'      # single quoted token
[A-Za-z']+   # everything else

No doubt this doesn't work 100% always; cases where quotes can be nested etc will be tricky.

References

  • regular-expressions.info/Character class



回答2:


Doubtful based on your logic, you have differentiation between an apostrophe and single quotes, i.e. There's and in my neighborhood

You'd have to develop some kind of pairing logic if you wanted what you have above. I'm thinking regular expressions. Or some kind of two part parse.



来源:https://stackoverflow.com/questions/3160564/split-tokenize-scan-a-string-being-aware-of-quotation-marks

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!