How can i sort large csv file without loading to memory

前端 未结 3 2093
北海茫月
北海茫月 2021-02-20 05:23

I have 20GB+ csv file like this:

**CallId,MessageNo,Information,Number** 
1000,1,a,2
99,2,bs,3
1000,3,g,4
66,2,a,3
20,16,3,b
1000,7,c,4
99,1,lz,4 
...

3条回答
  •  北恋
    北恋 (楼主)
    2021-02-20 05:49

    You should use OS sort commands. Typically it's just

    sort myfile
    

    followed by some mystical switches. These commands typically work well with large files, and there are often options to specify temporary storage on other physical harddrives. See this previous question, and the Windows sort command "man" page. Since Windows sort is not enough for your particular sorting problem, you may want to use GNU coreutils which bring the power of linux sort to Windows.

    Solution

    Here's what you need to do.

    1. Download GNU Coreutils Binaries ZIP and extract sort.exe from the bin folder to some folder on your machine, for example the folder where your to-be-sorted file is.
    2. Download GNU Coreutils Dependencies ZIP and extract both .dll files to the same folder as sort.exe

    Now assuming that your file looks like this:

    1000,1,a,2
    99,2,bs,3
    1000,3,g,4
    66,2,a,3
    20,16,3,b
    1000,7,c,4
    99,1,lz,4 
    

    you can write in the command prompt:

    sort.exe yourfile.csv -t, -g
    

    which would output:

    20,16,3,b
    66,2,a,3
    99,1,lz,4
    99,2,bs,3
    1000,1,a,2
    1000,3,g,4
    1000,7,c,4
    

    See more command options here. If this is what you want, don't forget to provide an output file with the -o switch, like so:

    sort.exe yourfile.csv -t, -g -o sorted.csv
    

提交回复
热议问题