I\'am using Mule Studio 3.4.0 Community Edition. I have a big problem about how to parse a large CSV file incoming with File Endpoint. The scenario is that I have 3 CSV fil
I believe that the csv-to-maps-transformer is going to force the whole file into memory. Since you are dealing with one large file, personally, I would tend to just write a Java class to handle it. The File endpoint will pass a filestream to your custom transformer. You can then make a JDBC connection and pick off the information a row at a time without having to load the whole file. I have used OpenCSV to parse the CSV for me. So your java class would contain something like the following:
protected Object doTransform(Object src, String enc) throws TransformerException {
try {
//Make a JDBC connection here
//Now read and parse the CSV
FileReader csvFileData = (FileReader) src;
BufferedReader br = new BufferedReader(csvFileData);
CSVReader reader = new CSVReader(br);
//Read the CSV file and add the row to the appropriate List(s)
String[] nextLine;
while ((nextLine = reader.readNext()) != null) {
//Push your data into the database through your JDBC connection
}
//Close connection.
}catch (Exception e){
}
As SteveS said, the csv-to-maps-transformer
might try to load the entire file to memory before process it. What you can try to do is split the csv file in smaller parts and send those parts to VM
to be processed individually.
First, create a component to achieve this first step:
public class CSVReader implements Callable{
@Override
public Object onCall(MuleEventContext eventContext) throws Exception {
InputStream fileStream = (InputStream) eventContext.getMessage().getPayload();
DataInputStream ds = new DataInputStream(fileStream);
BufferedReader br = new BufferedReader(new InputStreamReader(ds));
MuleClient muleClient = eventContext.getMuleContext().getClient();
String line;
while ((line = br.readLine()) != null) {
muleClient.dispatch("vm://in", line, null);
}
fileStream.close();
return null;
}
}
Then, split your main flow in two
<file:connector name="File"
workDirectory="yourWorkDirPath" autoDelete="false" streaming="true"/>
<flow name="CsvToFile" doc:name="Split and dispatch">
<file:inbound-endpoint path="inboxPath"
moveToDirectory="processedPath" pollingFrequency="60000"
doc:name="CSV" connector-ref="File">
<file:filename-wildcard-filter pattern="*.csv"
caseSensitive="true" />
</file:inbound-endpoint>
<component class="it.aizoon.grpBuyer.AddMessageProperty" doc:name="Add Message Property" />
<component class="com.dgonza.CSVReader" doc:name="Split the file and dispatch every line to VM" />
</flow>
<flow name="storeInDatabase" doc:name="receive lines and store in database">
<vm:inbound-endpoint exchange-pattern="one-way"
path="in" doc:name="VM" />
<Choice>
.
.
Your JDBC Stuff
.
.
<Choice />
</flow>
Maintain your current file-connector
configuration to enable streaming. With this solution the csv data can be processed without the need to load the entire file to memory first.
HTH