Java: splitting a comma-separated string but ignoring commas in quotes

前端 未结 11 1537
广开言路
广开言路 2020-11-21 05:16

I have a string vaguely like this:

foo,bar,c;qual=\"baz,blurb\",d;junk=\"quux,syzygy\"

that I want to split by commas -- but I need to igno

11条回答
  •  孤街浪徒
    2020-11-21 05:52

    I would not advise a regex answer from Bart, I find parsing solution better in this particular case (as Fabian proposed). I've tried regex solution and own parsing implementation I have found that:

    1. Parsing is much faster than splitting with regex with backreferences - ~20 times faster for short strings, ~40 times faster for long strings.
    2. Regex fails to find empty string after last comma. That was not in original question though, it was mine requirement.

    My solution and test below.

    String tested = "foo,bar,c;qual=\"baz,blurb\",d;junk=\"quux,syzygy\",";
    long start = System.nanoTime();
    String[] tokens = tested.split(",(?=([^\"]*\"[^\"]*\")*[^\"]*$)");
    long timeWithSplitting = System.nanoTime() - start;
    
    start = System.nanoTime(); 
    List tokensList = new ArrayList();
    boolean inQuotes = false;
    StringBuilder b = new StringBuilder();
    for (char c : tested.toCharArray()) {
        switch (c) {
        case ',':
            if (inQuotes) {
                b.append(c);
            } else {
                tokensList.add(b.toString());
                b = new StringBuilder();
            }
            break;
        case '\"':
            inQuotes = !inQuotes;
        default:
            b.append(c);
        break;
        }
    }
    tokensList.add(b.toString());
    long timeWithParsing = System.nanoTime() - start;
    
    System.out.println(Arrays.toString(tokens));
    System.out.println(tokensList.toString());
    System.out.printf("Time with splitting:\t%10d\n",timeWithSplitting);
    System.out.printf("Time with parsing:\t%10d\n",timeWithParsing);
    

    Of course you are free to change switch to else-ifs in this snippet if you feel uncomfortable with its ugliness. Note then lack of break after switch with separator. StringBuilder was chosen instead to StringBuffer by design to increase speed, where thread safety is irrelevant.

提交回复
热议问题