Good and effective CSV/TSV Reader for Java

后端 未结 4 1255
南笙
南笙 2021-01-11 09:27

I am trying to read big CSV and TSV (tab-separated) Files with about 1000000 rows or more. Now I tried to read a TSV cont

4条回答
  •  离开以前
    2021-01-11 10:28

    Do not use a CSV parser to parse TSV inputs. It will break if the TSV has fields with a quote character, for example.

    uniVocity-parsers comes with a TSV parser. You can parse a billion rows without problems.

    Example to parse a TSV input:

    TsvParserSettings settings = new TsvParserSettings();
    TsvParser parser = new TsvParser(settings);
    
    // parses all rows in one go.
    List allRows = parser.parseAll(new FileReader(yourFile));
    

    If your input is so big it can't be kept in memory, do this:

    TsvParserSettings settings = new TsvParserSettings();
    
    // all rows parsed from your input will be sent to this processor
    ObjectRowProcessor rowProcessor = new ObjectRowProcessor() {
        @Override
        public void rowProcessed(Object[] row, ParsingContext context) {
            //here is the row. Let's just print it.
            System.out.println(Arrays.toString(row));
        }
    };
    // the ObjectRowProcessor supports conversions from String to whatever you need:
    // converts values in columns 2 and 5 to BigDecimal
    rowProcessor.convertIndexes(Conversions.toBigDecimal()).set(2, 5);
    
    // converts the values in columns "Description" and "Model". Applies trim and to lowercase to the values in these columns.
    rowProcessor.convertFields(Conversions.trim(), Conversions.toLowerCase()).set("Description", "Model");
    
    //configures to use the RowProcessor
    settings.setRowProcessor(rowProcessor);
    
    TsvParser parser = new TsvParser(settings);
    //parses everything. All rows will be pumped into your RowProcessor.
    parser.parse(new FileReader(yourFile));
    

    Disclosure: I am the author of this library. It's open-source and free (Apache V2.0 license).

提交回复
热议问题