问题
I am trying to parse a CSV file using OpenCSV. One of the columns stores the data in YAML serialized format and is quoted because it can have comma inside it. It also has quotes inside it, so it is escaped by putting two quotes. I am able to parse this file easily in Ruby, but with OpenCSV I am not able to parse it fully. It is a UTF-8 encoded file.
Here is my Java snippet which is trying to read the file
CSVReader reader = new CSVReader(new InputStreamReader(new FileInputStream(csvFilePath), "UTF-8"), ',', '\"', '\\');
Here are 2 lines from this file. First line is not being parsed properly and is getting split at ""[Fair Trade Certified]""
because of escaped double quotes I guess.
1061658767,update,1196916,Product,28613099,Product::Source,"---
product_attributes:
-
- :name: Ornaments
:brand_id: 49120
:size: each
:alcoholic: false
:details: ""[Fair Trade Certified]""
:gluten_free: false
:kosher: false
:low_fat: false
:organic: false
:sugar_free: false
:fat_free: false
:vegan: false
:vegetarian: false
",,2015-11-01 00:06:19.796944,,,,,,
1061658768,create,,,28613100,Product::Source,"---
product_id:
retailer_id:
store_id:
source_id: 333790
locale: en_us
source_type: Product::PrehistoricProductDatum
priority: 1
is_definition:
product_attributes:
",,2015-11-01 00:06:19.927948,,,,,,
回答1:
The solution was to use a RFC4180 compatible CSV parser, as suggested by Paul. I had used CSVReader from OpenCSV which didn't work or maybe I couldn't get it to work properly.
I used FastCSV, a RFC4180 CSV parser, and it worked seamlessly.
File file = new File(csvFilePath);
CsvReader csvReader = new CsvReader();
CsvContainer csv = csvReader.read(file, StandardCharsets.UTF_8);
for (CsvRow row : csv.getRows()) {
System.out.println(row.getFieldCount());
}
回答2:
First off I am glad the FastCSV worked for you but I ran the suspected substring and ran it through the 3.9 openCSV and it worked with both the CsvParser and the RFC4180Parser. Could you please give a little detail on how it did not parse and/or try it with 3.9 openCSV to see if you get the same issue and then try with the configuration below.
Here are the tests that I used:
CSVParser:
@Test
public void parseBigStringFromStackOverflowWithMultipleQuotesInLine() throws IOException {
String bigline = "28613099,Product::Source,\"---\n" +
"product_attributes:\n" +
"-\n" +
"- :name: Ornaments\n" +
" :brand_id: 49120\n" +
" :size: each\n" +
" :alcoholic: false\n" +
" :details: \"\"[Fair Trade Certified]\"\"\n" +
" :gluten_free: false\n" +
" :kosher: false\n" +
" :low_fat: false\n" +
" :organic: false\n" +
" :sugar_free: false\n" +
" :fat_free: false\n" +
" :vegan: false\n" +
" :vegetarian: false\n" +
"\",,2015-11-01 00:06:19.796944";
String suspectString = "---\n" +
"product_attributes:\n" +
"-\n" +
"- :name: Ornaments\n" +
" :brand_id: 49120\n" +
" :size: each\n" +
" :alcoholic: false\n" +
" :details: \"[Fair Trade Certified]\"\n" +
" :gluten_free: false\n" +
" :kosher: false\n" +
" :low_fat: false\n" +
" :organic: false\n" +
" :sugar_free: false\n" +
" :fat_free: false\n" +
" :vegan: false\n" +
" :vegetarian: false\n" ;
StringReader stringReader = new StringReader(bigline);
CSVReaderBuilder builder = new CSVReaderBuilder(stringReader);
CSVReader csvReader = builder.withFieldAsNull(CSVReaderNullFieldIndicator.BOTH).build();
String item[] = csvReader.readNext();
assertEquals(5, item.length);
assertEquals("28613099", item[0]);
assertEquals("Product::Source", item[1]);
assertEquals(suspectString, item[2]);
}
RFC4180Parser
def 'parse big line from stackoverflow with complex string'() {
given:
RFC4180ParserBuilder builder = new RFC4180ParserBuilder()
RFC4180Parser parser = builder.build()
String bigline = "28613099,Product::Source,\"---\n" +
"product_attributes:\n" +
"-\n" +
"- :name: Ornaments\n" +
" :brand_id: 49120\n" +
" :size: each\n" +
" :alcoholic: false\n" +
" :details: \"\"[Fair Trade Certified]\"\"\n" +
" :gluten_free: false\n" +
" :kosher: false\n" +
" :low_fat: false\n" +
" :organic: false\n" +
" :sugar_free: false\n" +
" :fat_free: false\n" +
" :vegan: false\n" +
" :vegetarian: false\n" +
"\",,2015-11-01 00:06:19.796944"
String suspectString = "---\n" +
"product_attributes:\n" +
"-\n" +
"- :name: Ornaments\n" +
" :brand_id: 49120\n" +
" :size: each\n" +
" :alcoholic: false\n" +
" :details: \"[Fair Trade Certified]\"\n" +
" :gluten_free: false\n" +
" :kosher: false\n" +
" :low_fat: false\n" +
" :organic: false\n" +
" :sugar_free: false\n" +
" :fat_free: false\n" +
" :vegan: false\n" +
" :vegetarian: false\n"
when:
String[] values = parser.parseLine(bigline)
then:
values.length == 5
values[0] == "28613099"
values[1] == "Product::Source"
values[2] == suspectString
}
来源:https://stackoverflow.com/questions/41948442/parse-csv-with-opencsv-with-double-quotes-inside-a-quoted-field