String Tokenizer : split string by comma and ignore comma in double quotes

后端 未结 6 2046
闹比i
闹比i 2021-01-18 03:58

I have a string like below -

value1, value2, value3, value4, \"value5, 1234\", value6, value7, \"value8\", value9, \"value10, 123.23\"

相关标签:
6条回答
  • 2021-01-18 04:29

    Use a CSV parser like OpenCSV to take care of things like commas in quoted elements, values that span multiple lines etc. automatically. You can use the library to serialize your text back as CSV as well.

    String str = "value1, value2, value3, value4, \"value5, 1234\", " +
            "value6, value7, \"value8\", value9, \"value10, 123.23\"";
    
    CSVReader reader = new CSVReader(new StringReader(str));
    
    String [] tokens;
    while ((tokens = reader.readNext()) != null) {
        System.out.println(tokens[0]); // value1
        System.out.println(tokens[4]); // value5, 1234
        System.out.println(tokens[9]); // value10, 123.23
    }
    
    0 讨论(0)
  • 2021-01-18 04:29
    String delimiter = ",";
    
    String v = "value1, value2, value3, value4, \"value5, 1234\", value6, value7, \"value8\", value9, \"value10, 123.23\"";
    
    String[] a = v.split(delimiter + "(?=(?:(?:[^\"]*+\"){2})*+[^\"]*+$)");
    
    0 讨论(0)
  • 2021-01-18 04:30

    I'm allergic to regex; why not double-split as someone suggested?

        String str = "value1, value2, value3, value4, \"value5, 1234\", value6, value7, \"value8\", value9, \"value10, 123.23\"";
        boolean quoted = false;
        for(String q : str.split("\"")) {
            if(quoted)
                System.out.println(q.trim());
            else
                for(String s : q.split(","))
                    if(!s.trim().isEmpty())
                        System.out.println(s.trim());
            quoted = !quoted;
        }
    
    0 讨论(0)
  • 2021-01-18 04:30

    Without any third party library dependency, following code can also parse the fields as per the requirements given:

    import java.util.*;
    
    public class CSVSpliter {
    
      public static void main (String [] args) {
        String inputStr = "value1, value2, value3, value4, \"value5, 1234\", value6, value7, \"value8\", value9, \"value10, 123.23\"";
    
        StringBuffer sb = new StringBuffer (inputStr);
        List<String> splitStringList = new ArrayList<String> ();
        boolean insideDoubleQuotes = false;
        StringBuffer field = new StringBuffer ();
    
        for (int i=0; i < sb.length(); i++) {
            if (sb.charAt (i) == '"' && !insideDoubleQuotes) {
                insideDoubleQuotes = true;
            } else if (sb.charAt(i) == '"' && insideDoubleQuotes) {
                insideDoubleQuotes = false;
                splitStringList.add (field.toString().trim());
                field.setLength(0);
            } else if (sb.charAt(i) == ',' && !insideDoubleQuotes) {
                // ignore the comma after double quotes.
                if (field.length() > 0) {
                    splitStringList.add (field.toString().trim());
                }
                // clear the field for next word
                field.setLength(0);
            } else {
                field.append (sb.charAt(i));
            }
        }
        for (String str: splitStringList) {
            System.out.println ("Split fields: "+str);
        }
    }
    

    }

    This will give the following output:

    Split fields: value1

    Split fields: value2

    Split fields: value3

    Split fields: value4

    Split fields: value5, 1234

    Split fields: value6

    Split fields: value7

    Split fields: value8

    Split fields: value9

    Split fields: value10, 123.23

    0 讨论(0)
  • 2021-01-18 04:43

    You can use several approaches:

    1. Write code that search for comas and maintain a state weather a particular coma is in quotes or note.
    2. Tokenize by double-quote symbol and than tokenize strings in the result array by comma symbol (make sure you tokenize strings with indexes 0, 2, 4, etc., since they were not in double quotes in the original string)
    0 讨论(0)
  • 2021-01-18 04:44

    You just need one line and the right regex:

    String[] values = input.replaceAll("^\"", "").split("\"?(,|$)(?=(([^\"]*\"){2})*[^\"]*$) *\"?");
    

    This also neatly trims off the wrapping double quotes for you too, including the final quote!

    Note: Interesting edge case when the first term is quoted required an extra step of trimming the leading quote using replaceAll().

    Here's some test code:

    String input= "\"value1, value2\", value3, value4, \"value5, 1234\", " +
        "value6, value7, \"value8\", value9, \"value10, 123.23\"";
    String[] values = input.replaceAll("^\"", "").split("\"?(,|$)(?=(([^\"]*\"){2})*[^\"]*$) *\"?");
    for (String s : values)
    System.out.println(s);
    

    Output:

    value1, value2
    value3
    value4
    value5, 1234
    value6
    value7
    value8
    value9
    value10, 123.23
    
    0 讨论(0)
提交回复
热议问题