How do I read a large file from disk to database without running out of memory

后端未结

关注

 6  845

I feel embarrassed to ask this question as I feel like I should already know. However, given I don\'t....I want to know how to read large files from disk to a database witho

相关标签:

6条回答

清酒与你

2021-01-13 15:31
I think you may have a red herring with the size of the data. Every time I come across this problem, it's not the size of the data but the amount of objects created when looping over the data.

Look in your while loop adding records to the db within the method button3_Click(object sender, EventArgs e):
```
TextReader tr = GetData();
using (CsvDataReader csvData = new CsvDataReader(tr, '\t'))
```
Here you declare and instantiate two objects each iteration - meaning for each chunk of file you read you will instantiate 200,000 objects; the garbage collector will not keep up.

Why not declare the objects outside of the while loop?
```
TextReader tr = null;
CsvDataReader csvData = null;
```
This way, the gc will stand half a chance. You could prove the difference by benchmarking the while loop, you will no doubt notice a huge performance degradation after you have created just a couple of thousand objects.
0 讨论(0)
发布评论:

提交评论
- 加载中...
难免孤独

2021-01-13 15:33

I would just add using BufferedFileReader with the readLine method and doing exatcly in the fashion above.

Basically understanding the resposnisbilties here.

BufferedFileReader is the class reading data from file (buffe wise) There should be a LineReader too. CSVReader is a util class for reading the data assuming that its in correct format.

SQlBulkCopy you are anywsay using.

Second Option

You can go to the import facility of database directly. If the format of the file is correct and thw hole point of program is this only. that would be faster too.

0 讨论(0)
发布评论:

提交评论
- 加载中...
悲&欢浪女

2021-01-13 15:34

Probably not the answer you're looking for but this is what BULK INSERT was designed for.

0 讨论(0)
发布评论:

提交评论
- 加载中...
我在风中等你

2021-01-13 15:41
Actually your code is reading all data from file and keep into TextReader(in memory). Then you read data from TextReader to Save server.

If data is so big, data size in TextReader caused out of memory. Please try this way.

1) Read data (each line) from File.

2) Then insert each line to Server.

Out of memory problem will be solved because only each record in memory while processing.

Pseudo code
```
begin tran

While (data = FilerReader.ReadLine())
{
  insert into Table[col0,col1,etc] values (data[0], data[1], etc)
}

end tran
```
0 讨论(0)
发布评论:

提交评论
- 加载中...

粉色の甜心

2021-01-13 15:42

Instead of reading csv rows one by one and inserting into db one by one I suggest read a chunk and insert it into database. Repeat this process until the entire file has been read.

You can buffer in memory, say 1000 csv rows at a time, then insert them in the database.

int MAX_BUFFERED=1000;
int counter=0;
List<List<String>> bufferedRows= new ...

while (scanner.hasNext()){
  List<String> rowEntries= getData(scanner.getLine())
  bufferedRows.add(rowEntries);

  if (counter==MAX_BUFFERED){
    //INSERT INTO DATABASE
    //append all contents to a string buffer and create your SQL INSERT statement
    bufferedRows.clearAll();//remove data so it could be GCed when GC kicks in
  }
}

0 讨论(0)

悲哀的现实

2021-01-13 15:54
pseudo code:
```
while (!EOF) {
   while (chosenRecords.size() < WRITE_BUFFER_LIST_SIZE) {
      MyRecord record = chooseOrSkipRecord(file.readln());
      if (record != null) {
         chosenRecords.add(record)
      }
   }  
   insertRecords(chosenRecords) // <== writes data and clears the list
}
```
WRITE_BUFFER_LIST_SIZE is just a constant that you set... bigger means bigger batches and smaller means smaller batches. A size of 1 is RBAR :).

If your operation is big enough that failing partway through is a realistic possibility, or if failing partway through could cost someone a non-trivial amount of money, you probably want to also write to a second table the total number of records processed so far from the file (including the ones you skipped) as part of the same transaction so that you can pick up where you left off in the event of partial completion.
0 讨论(0)
发布评论:

提交评论
- 加载中...