I have a large .csv
file (about 300 MB), which is read from a remote host, and parsed into a target file, but I don\'t need to copy all the lines to the target
This is a late response, but you CAN use a BufferedReader
with the CSVParser:
try (BufferedReader reader = new BufferedReader(new FileReader(fileName), 1048576 * 10)) {
Iterable<CSVRecord> records = CSVFormat.RFC4180.parse(reader);
for (CSVRecord line: records) {
// Process each line here
}
catch (...) { // handle exceptions from your bufferedreader here
No matter what you do, all of the data from your file is going to come over to your local machine because your system needs to parse through it to determine validity. Whether the file arrives via a file read through the parser (so you can parse each line), or whether you just copy the entire file over for parsing purposes, it will all come over to local. You will need to get the data local, then trim the excess.
Calling csvFileParser.getRecords()
is already a lost battle because the documentation explains that that method loads every row of your file into memory. To parse the record while conserving active memory, you should instead iterate over each record; the documentation implies the following code loads one record to memory at a time:
CSVParser csvFileParser = CSVParser.parse(new File("filePath"), csvFileFormat);
for (CSVRecord csvRecord : csvFileParser) {
... // qualify the csvRecord; output qualified row to new file and flush as needed.
}
Since you explained that "filePath"
is not local, the above solution is prone to failure due to connectivity issues. To eliminate connectivity issues, I recommend you copy the entire remote file over to local, ensure the file copied accurately by comparing checksums, parse the local copy to create your target file, then delete the local copy after completion.