Parse CSV with OpenCSV with double quotes inside a quoted field

雨燕双飞 提交于 2019-12-20 05:21:18

问题


I am trying to parse a CSV file using OpenCSV. One of the columns stores the data in YAML serialized format and is quoted because it can have comma inside it. It also has quotes inside it, so it is escaped by putting two quotes. I am able to parse this file easily in Ruby, but with OpenCSV I am not able to parse it fully. It is a UTF-8 encoded file.

Here is my Java snippet which is trying to read the file

CSVReader reader = new CSVReader(new InputStreamReader(new FileInputStream(csvFilePath), "UTF-8"), ',', '\"', '\\');

Here are 2 lines from this file. First line is not being parsed properly and is getting split at ""[Fair Trade Certified]"" because of escaped double quotes I guess.

1061658767,update,1196916,Product,28613099,Product::Source,"---
product_attributes:
-
- :name: Ornaments
  :brand_id: 49120
  :size: each
  :alcoholic: false
  :details: ""[Fair Trade Certified]""
  :gluten_free: false
  :kosher: false
  :low_fat: false
  :organic: false
  :sugar_free: false
  :fat_free: false
  :vegan: false
  :vegetarian: false
",,2015-11-01 00:06:19.796944,,,,,,
1061658768,create,,,28613100,Product::Source,"---
product_id:
retailer_id:
store_id:
source_id: 333790
locale: en_us
source_type: Product::PrehistoricProductDatum
priority: 1
is_definition:
product_attributes:
",,2015-11-01 00:06:19.927948,,,,,,

回答1:


The solution was to use a RFC4180 compatible CSV parser, as suggested by Paul. I had used CSVReader from OpenCSV which didn't work or maybe I couldn't get it to work properly.

I used FastCSV, a RFC4180 CSV parser, and it worked seamlessly.

File file = new File(csvFilePath);
CsvReader csvReader = new CsvReader();
CsvContainer csv = csvReader.read(file, StandardCharsets.UTF_8);
for (CsvRow row : csv.getRows()) {
    System.out.println(row.getFieldCount());  
}



回答2:


First off I am glad the FastCSV worked for you but I ran the suspected substring and ran it through the 3.9 openCSV and it worked with both the CsvParser and the RFC4180Parser. Could you please give a little detail on how it did not parse and/or try it with 3.9 openCSV to see if you get the same issue and then try with the configuration below.

Here are the tests that I used:

CSVParser:

@Test
public void parseBigStringFromStackOverflowWithMultipleQuotesInLine() throws IOException {

    String bigline = "28613099,Product::Source,\"---\n" +
            "product_attributes:\n" +
            "-\n" +
            "- :name: Ornaments\n" +
            "  :brand_id: 49120\n" +
            "  :size: each\n" +
            "  :alcoholic: false\n" +
            "  :details: \"\"[Fair Trade Certified]\"\"\n" +
            "  :gluten_free: false\n" +
            "  :kosher: false\n" +
            "  :low_fat: false\n" +
            "  :organic: false\n" +
            "  :sugar_free: false\n" +
            "  :fat_free: false\n" +
            "  :vegan: false\n" +
            "  :vegetarian: false\n" +
            "\",,2015-11-01 00:06:19.796944";

    String suspectString = "---\n" +
            "product_attributes:\n" +
            "-\n" +
            "- :name: Ornaments\n" +
            "  :brand_id: 49120\n" +
            "  :size: each\n" +
            "  :alcoholic: false\n" +
            "  :details: \"[Fair Trade Certified]\"\n" +
            "  :gluten_free: false\n" +
            "  :kosher: false\n" +
            "  :low_fat: false\n" +
            "  :organic: false\n" +
            "  :sugar_free: false\n" +
            "  :fat_free: false\n" +
            "  :vegan: false\n" +
            "  :vegetarian: false\n" ;

    StringReader stringReader = new StringReader(bigline);

    CSVReaderBuilder builder = new CSVReaderBuilder(stringReader);
    CSVReader csvReader = builder.withFieldAsNull(CSVReaderNullFieldIndicator.BOTH).build();

    String item[] = csvReader.readNext();

    assertEquals(5, item.length);
    assertEquals("28613099", item[0]);
    assertEquals("Product::Source", item[1]);
    assertEquals(suspectString, item[2]);
}

RFC4180Parser

def 'parse big line from stackoverflow with complex string'() {
    given:
    RFC4180ParserBuilder builder = new RFC4180ParserBuilder()
    RFC4180Parser parser = builder.build()
    String bigline = "28613099,Product::Source,\"---\n" +
            "product_attributes:\n" +
            "-\n" +
            "- :name: Ornaments\n" +
            "  :brand_id: 49120\n" +
            "  :size: each\n" +
            "  :alcoholic: false\n" +
            "  :details: \"\"[Fair Trade Certified]\"\"\n" +
            "  :gluten_free: false\n" +
            "  :kosher: false\n" +
            "  :low_fat: false\n" +
            "  :organic: false\n" +
            "  :sugar_free: false\n" +
            "  :fat_free: false\n" +
            "  :vegan: false\n" +
            "  :vegetarian: false\n" +
            "\",,2015-11-01 00:06:19.796944"

    String suspectString = "---\n" +
            "product_attributes:\n" +
            "-\n" +
            "- :name: Ornaments\n" +
            "  :brand_id: 49120\n" +
            "  :size: each\n" +
            "  :alcoholic: false\n" +
            "  :details: \"[Fair Trade Certified]\"\n" +
            "  :gluten_free: false\n" +
            "  :kosher: false\n" +
            "  :low_fat: false\n" +
            "  :organic: false\n" +
            "  :sugar_free: false\n" +
            "  :fat_free: false\n" +
            "  :vegan: false\n" +
            "  :vegetarian: false\n"

    when:
    String[] values = parser.parseLine(bigline)

    then:
    values.length == 5
    values[0] == "28613099"
    values[1] == "Product::Source"
    values[2] == suspectString
}


来源:https://stackoverflow.com/questions/41948442/parse-csv-with-opencsv-with-double-quotes-inside-a-quoted-field

标签
易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!