Sorting big file (10G)

萝らか妹 提交于 2020-01-12 08:52:03

问题


I'm trying to sort a big table stored in a file. The format of the file is (ID, intValue)

The data is sorted by ID, but what I need is to sort the data using the intValue, in descending order.

For example

ID  | IntValue
1   | 3
2   | 24
3   | 44
4   | 2

to this table

ID  | IntValue
3   | 44
2   | 24
1   | 3
4   | 2

How can I use the Linux sort command to do the operation? Or do you recommend another way?


回答1:


How can I use the Linux sort command to do the operation? Or do you recommend another way?

As others have already pointed out, see man sort for -k & -t command line options on how to sort by some specific element in the string.

Now, the sort also has facility to help sort huge files which potentially don't fit into the RAM. Namely the -m command line option, which allows to merge already sorted files into one. (See merge sort for the concept.) The overall process is fairly straight forward:

  1. Split the big file into small chunks. Use for example the split tool with the -l option. E.g.:

    split -l 1000000 huge-file small-chunk

  2. Sort the smaller files. E.g.

    for X in small-chunk*; do sort -t'|' -k2 -nr < $X > sorted-$X; done

  3. Merge the sorted smaller files. E.g.

    sort -t'|' -k2 -nr -m sorted-small-chunk* > sorted-huge-file

  4. Clean-up: rm small-chunk* sorted-small-chunk*

The only thing you have to take special care about is the column header.




回答2:


How about:

sort -t' ' -k2 -nr < test.txt

where test.txt

$ cat test.txt 
1  3
2  24
3  44
4  2

gives sorting in descending order (option -r)

$ sort -t' ' -k2 -nr < test.txt 
3  44
2  24
1  3
4  2

while this sorts in ascending order (without option -r)

$ sort -t' ' -k2 -n < test.txt 
4  2
1  3
2  24
3  44

in case you have duplicates

$ cat test.txt 
1  3
2  24
3  44
4  2
4  2

use the uniq command like this

$ sort -t' ' -k2 -n < test.txt | uniq 
4  2
1  3
2  24
3  44


来源:https://stackoverflow.com/questions/34090744/sorting-big-file-10g

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!