发表新帖

发表新帖

How can i sort large csv file without loading to memory

前端未结

关注

 3  2092

北海茫月 2021-02-20 05:23

I have 20GB+ csv file like this:

**CallId,MessageNo,Information,Number** 1000,1,a,2 99,2,bs,3 1000,3,g,4 66,2,a,3 20,16,3,b 1000,7,c,4 99,1,lz,4 ...

3条回答

遇见更好的自我 (楼主)

2021-02-20 05:36
This is a classical algorithm problem called External Sorting.

External sorting is required when the data being sorted do not fit into the main memory of a computing device (usually RAM) and instead they must reside in the slower external memory (usually a hard drive). External sorting typically uses a sort-merge strategy. In the sorting phase, chunks of data small enough to fit in main memory are read, sorted, and written out to a temporary file. In the merge phase, the sorted subfiles are combined into a single larger file

From .NET Framework point of view I would recommend to leverage .NET 4 feature - Memory Mapped Files to project parts of the file in memory as separate views.

Here is an Java example of External Merge Sort, you should be able to adopt it to C# easily:

EDIT: Added usage example of the mentioned Java sample to demonstrate its simplicity
```
Comparator comparator = new Comparator() 
{                         
  public int compare(String r1, String r2)
  {                                 
     return r1.compareTo(r2);
  }
};                 

List l = sortInBatch(new File(inputfile), comparator);                
mergeSortedFiles(l, new File(outputfile), comparator); 
```
0 讨论(0)

查看其它3个回答
发布评论:

提交评论
- 加载中...

热议问题