Best strategy for processing large CSV files in Apache Camel

后端 未结 3 769
一生所求
一生所求 2020-12-01 11:02

I\'d like to develop a route that polls a directory containing CSV files, and for every file it unmarshals each row using Bindy and queues it in activemq.

The probl

相关标签:
3条回答
  • 2020-12-01 11:36

    For the record and for other users which might have searched for this as much as me, meanwhile there seems to be an easier method which also works well with useMaps:

    CsvDataFormat csv = new CsvDataFormat()
        .setLazyLoad(true)
        .setUseMaps(true);
    
    from("file://data/inbox?noop=true&maxMessagesPerPoll=1&delay=5000")
        .unmarshal(csv)
        .split(body()).streaming()
        .to("log:mappedRow?multiline=true");
    
    0 讨论(0)
  • 2020-12-01 11:37

    Using both Splitter and Aggregator EIPs would be the best strategy for processing large CSV files in Apache Camel. Read more about it form Composed Message Processor

    Here is an example using Java DSL:

    package com.camel;
    
    import org.apache.camel.CamelContext;
    import org.apache.camel.builder.RouteBuilder;
    import org.apache.camel.dataformat.csv.CsvDataFormat;
    import org.apache.camel.impl.DefaultCamelContext;
    import org.apache.commons.csv.CSVFormat;
    import org.apache.commons.csv.QuoteMode;
    
    public class FileSplitter {
    
        public static void main(String args[]) throws Exception {
            CamelContext context = new DefaultCamelContext();
            CsvDataFormat csvParser = new CsvDataFormat(CSVFormat.DEFAULT);
            csvParser.setSkipHeaderRecord(true);
            csvParser.setQuoteMode(QuoteMode.ALL);
            context.addRoutes(new RouteBuilder() {
                public void configure() {
                    String fileName = "Hello.csv";
                    int lineCount = 20;
                    System.out.println("fileName = " + fileName);
                    System.out.println("lineCount = " + lineCount);
                    from("file:data/inbox?noop=true&fileName=" + fileName).unmarshal(csvParser).split(body()).streaming()
                            .aggregate(constant(true), new ArrayListAggregationStrategy()).completionSize(lineCount)
                            .completionTimeout(1500).marshal(csvParser)
                            .to("file:data/outbox?fileName=${file:name.noext}_${header.CamelSplitIndex}.csv");
                }
            });
            context.start();
            Thread.sleep(10000);
            context.stop();
            System.out.println("End");
        }
    }
    
    0 讨论(0)
  • 2020-12-01 11:38

    If you use the Splitter EIP then you can use streaming mode which means Camel will process the file on a row by row basis.

    from("file://data/inbox?noop=true&maxMessagesPerPoll=1&delay=5000")
      .split(body().tokenize("\n")).streaming()
        .unmarshal().bindy(BindyType.Csv, "com.ess.myapp.core")           
        .to("jms:rawTraffic");
    
    0 讨论(0)
提交回复
热议问题