CSVReader - bug when using " for escape char

前端 未结 3 1178
孤城傲影
孤城傲影 2021-01-15 09:20

I am using OpenCSV.

I have a CSVReader trying to parse a CSV file.
That file has quote char \" and separator char , an

相关标签:
3条回答
  • 2021-01-15 09:55

    It cannot be done through CSVReader

    from pyspark.sql.session import SparkSession
    
    spark = SparkSession(sc)
    rdd = spark.read.csv("csv.csv", multiLine=True, header="False", encoding='utf-8', escape= "\"")
    
    0 讨论(0)
  • 2021-01-15 10:11

    The CSVReader is not fully RFC4180 compliant. Use their newer CSV reader (RFC4180Parser):

    RFC4180Parser rfc4180Parser = new RFC4180ParserBuilder().build();
    CSVReaderBuilder csvReaderBuilder = new CSVReaderBuilder(
        new FileReader("input.csv"));
    
    CSVReader reader = csvReaderBuilder
        .withCSVParser(rfc4180Parser)
        .build();
    

    To read a String line formatted as a CSV:

    String test = "ballet 24\"\" classes";
    String[] columns = new RFC4180Parser().parseLine(test);
    

    To use the reader (an alternative is reader.readNext()):

    for (String[] line : reader.readAll()) {
      for (String s : line) {
        System.out.println(s);
      }
    }
    

    See http://opencsv.sourceforge.net/#rfc4180parser for more details.

    Code taken from GeekPrompt

    0 讨论(0)
  • 2021-01-15 10:16

    It will work if you go with the default settings for CsvReader.

    Check this open bug they have: sourceforge.net/p/opencsv/bugs/83:

    Actually, it works fine, just not the way you think. Its defaults are comma for separator, quote for the quote character, and backslash for the escape character. However, it understands two consecutive quote characters as an escaped quote character. So, if you just go with the defaults, it will work fine.

    By default, it is able to escape double quote with double quote, but your 'true' escape character must still be something else.

    So the following works:

    CSVReader reader = new CSVReader(new FileReader(App.class.getClassLoader().getResource("csv.csv").getFile()), ',','"','-');
    
    • comma as separator
    • double quote as quote char
    • dash (any other character) as escape character

    At first I put '\' as escape character, but then, your field "\" would need to be modified to escape the escape character.

    0 讨论(0)
提交回复
热议问题