问题
I have a huge .csv
file, to be specific a .TAB
file with 29 million rows and the file size is around 600 MB. I would need to read this into an IEnumerable
collection.
I have tried CsvHelper
, GenericParser
, and few other solutions but always ending up with an Out of Memory exception
Please suggest a way to do this
I have tried
var deliveryPoints = new List<Point>();
using (TextReader csvreader1 = File.OpenText(@"C:\testfile\Prod\PCDP1705.TAB")) //StreamReader csvreader1 = new StreamReader(@"C:\testfile\Prod\PCDP1705.TAB"))
using (var csvR1 = new CsvReader(csvreader1, csvconfig))
{
csvR1.Configuration.RegisterClassMap<DeliveryMap>();
deliveryPoints = csvR1.GetRecords<Point>().ToList();
}
using (GenericParser parser = new GenericParser())
{
parser.SetDataSource(@"C:\testfile\Prod\PCDP1705.TAB");
parser.ColumnDelimiter = '\t';
parser.FirstRowHasHeader = false;
//parser.SkipStartingDataRows = 10;
//parser.MaxBufferSize = 4096;
//parser.MaxRows = 500;
parser.TextQualifier = '\"';
while (parser.Read())
{
var address = new Point();
address.PostCodeID = int.Parse(parser[0]);
address.DPS = parser[1];
address.OrganisationFlag = parser[2];
deliveryPoints.Add(address);
}
}
and
var deliveryPoints = new List<Point>();
csvreader = new StreamReader(@"C:\testfile\Prod\PCDP1705.TAB");
csv = new CsvReader(csvreader, csvconfig);
while (csv.Read())
{
var address = new Point();
address.PostCodeID = int.Parse(csv.GetField(0));
address.DPS = csv.GetField(1);
deliveryPoints.Add(address);
}
回答1:
No need to use 3rd party software. Use Net Library methods
using System;
using System.Collections.Generic;
using System.Linq;
using System.Text;
using System.IO;
using System.Data;
namespace ConsoleApplication1
{
class Program
{
static void Main(string[] args)
{
StreamReader csvreader = new StreamReader(@"C:\testfile\Prod\PCDP1705.TAB");
string inputLine = "";
while ((inputLine = csvreader.ReadLine()) != null)
{
var address = new Point();
string[] csvArray = inputLine.Split(new char[] { ',' });
address.postCodeID = int.Parse(csvArray[0]);
address.DPS = csvArray[1];
Point.deliveryPoints.Add(address);
}
//add data to datatable
DataTable dt = new DataTable();
dt.Columns.Add("Post Code", typeof(int));
dt.Columns.Add("DPS", typeof(string));
foreach (Point point in Point.deliveryPoints)
{
dt.Rows.Add(new object[] { point.postCodeID, point.DPS });
}
}
}
public class Point
{
public static List<Point> deliveryPoints = new List<Point>();
public int postCodeID { get; set; }
public string DPS { get; set; }
}
}
回答2:
It worked by running in x64 mode, and by adding
<gcAllowVeryLargeObjects enabled="true" />
in app.config.
回答3:
The problem is that you are loading entire file into memory. You can compile your code to x64 which will increase memory limit for your program rapidly, but it is not recommended if you can avoid loading entire file into memory.
Notice that calling ToList() forces the CsvReader to load entire file into memory at once:
csvR1.GetRecords<Point>().ToList();
But this will load only one line at a time:
foreach(var record in csvR1.GetRecords<Point>())
{
//do whatever with the single record
}
This way you can process files of unlimited size
来源:https://stackoverflow.com/questions/44362395/how-to-read-huge-csv-file-with-29-million-rows-of-data-using-net