I have a string like below -
value1, value2, value3, value4, \"value5, 1234\", value6, value7, \"value8\", value9, \"value10, 123.23\"
Use a CSV parser like OpenCSV to take care of things like commas in quoted elements, values that span multiple lines etc. automatically. You can use the library to serialize your text back as CSV as well.
String str = "value1, value2, value3, value4, \"value5, 1234\", " +
"value6, value7, \"value8\", value9, \"value10, 123.23\"";
CSVReader reader = new CSVReader(new StringReader(str));
String [] tokens;
while ((tokens = reader.readNext()) != null) {
System.out.println(tokens[0]); // value1
System.out.println(tokens[4]); // value5, 1234
System.out.println(tokens[9]); // value10, 123.23
}
String delimiter = ",";
String v = "value1, value2, value3, value4, \"value5, 1234\", value6, value7, \"value8\", value9, \"value10, 123.23\"";
String[] a = v.split(delimiter + "(?=(?:(?:[^\"]*+\"){2})*+[^\"]*+$)");
I'm allergic to regex; why not double-split as someone suggested?
String str = "value1, value2, value3, value4, \"value5, 1234\", value6, value7, \"value8\", value9, \"value10, 123.23\"";
boolean quoted = false;
for(String q : str.split("\"")) {
if(quoted)
System.out.println(q.trim());
else
for(String s : q.split(","))
if(!s.trim().isEmpty())
System.out.println(s.trim());
quoted = !quoted;
}
Without any third party library dependency, following code can also parse the fields as per the requirements given:
import java.util.*;
public class CSVSpliter {
public static void main (String [] args) {
String inputStr = "value1, value2, value3, value4, \"value5, 1234\", value6, value7, \"value8\", value9, \"value10, 123.23\"";
StringBuffer sb = new StringBuffer (inputStr);
List<String> splitStringList = new ArrayList<String> ();
boolean insideDoubleQuotes = false;
StringBuffer field = new StringBuffer ();
for (int i=0; i < sb.length(); i++) {
if (sb.charAt (i) == '"' && !insideDoubleQuotes) {
insideDoubleQuotes = true;
} else if (sb.charAt(i) == '"' && insideDoubleQuotes) {
insideDoubleQuotes = false;
splitStringList.add (field.toString().trim());
field.setLength(0);
} else if (sb.charAt(i) == ',' && !insideDoubleQuotes) {
// ignore the comma after double quotes.
if (field.length() > 0) {
splitStringList.add (field.toString().trim());
}
// clear the field for next word
field.setLength(0);
} else {
field.append (sb.charAt(i));
}
}
for (String str: splitStringList) {
System.out.println ("Split fields: "+str);
}
}
}
This will give the following output:
Split fields: value1
Split fields: value2
Split fields: value3
Split fields: value4
Split fields: value5, 1234
Split fields: value6
Split fields: value7
Split fields: value8
Split fields: value9
Split fields: value10, 123.23
You can use several approaches:
You just need one line and the right regex:
String[] values = input.replaceAll("^\"", "").split("\"?(,|$)(?=(([^\"]*\"){2})*[^\"]*$) *\"?");
This also neatly trims off the wrapping double quotes for you too, including the final quote!
Note: Interesting edge case when the first term is quoted required an extra step of trimming the leading quote using replaceAll()
.
Here's some test code:
String input= "\"value1, value2\", value3, value4, \"value5, 1234\", " +
"value6, value7, \"value8\", value9, \"value10, 123.23\"";
String[] values = input.replaceAll("^\"", "").split("\"?(,|$)(?=(([^\"]*\"){2})*[^\"]*$) *\"?");
for (String s : values)
System.out.println(s);
Output:
value1, value2
value3
value4
value5, 1234
value6
value7
value8
value9
value10, 123.23