sort a file based on a column in another file

試著忘記壹切 提交于 2021-02-07 08:45:25

问题


I have two files both in the format of:

loc1 num1 num2
loc2 num3 num4

The first column is the location and I want to use the order of the locations in the first file to sort the second file so that I can put the two files together where the numbers are right for the location.

I can write a perl script to do this but I felt there might be some quick/easy shell/awk command to achieve this. Do you have any ideas?

Thanks.

Edits:

Here is the input, now I actually want to use column 2 in file 1 to sort file2.

File1:

GID     location        NAME    GWEIGHT C1SI    M1CO    M1SI    C1LY    M1LY    C1CO    C1LI    M1LI
AID                             ARRY2X  ARRY1X  ARRY3X  ARRY4X  ARRY5X  ARRY0X  ARRY6X  ARRY7X
EWEIGHT                         1.000000        1.000000        1.000000        1.000000        1.000000        1.000000        1.000000        1.000000
GENE735X        chr17:66199278-66199496 chr17:66199278-66199496 1.000000        0.211785        -0.853890       1.071875        0.544136        0.703871     0.371880 0.218960        -2.268618
GENE1562X       chr10:80097054-80097298 chr10:80097054-80097298 1.000000        0.533673        -0.397202       0.783363        0.109824        -0.436342    0.158667 0.475748        -1.227730
GENE6579X       chr19:23694188-23694395 chr19:23694188-23694395 1.000000        0.127748        -0.203827       0.846738        0.045599        -0.211767    0.415442 0.282123        -1.302055

File 2:

GID     location        NAME    GWEIGHT C1SI    M1CO    M1SI    C1LY    M1LY    C1CO    C1LI    M1LI
AID                             ARRY2X  ARRY1X  ARRY3X  ARRY4X  ARRY5X  ARRY0X  ARRY6X  ARRY7X
EWEIGHT                         1.000000        1.000000        1.000000        1.000000        1.000000        1.000000        1.000000        1.000000
GENE6579X       chr19:23694188-23694395 chr19:23694188-23694395 1.000000        0.127748        -0.203827       0.846738        0.045599        -0.211767    0.415442 0.282123        -1.302055
GENE735X        chr17:66199278-66199496 chr17:66199278-66199496 1.000000        0.211785        -0.853890       1.071875        0.544136        0.703871     0.371880 0.218960        -2.268618
GENE1562X       chr10:80097054-80097298 chr10:80097054-80097298 1.000000        0.533673        -0.397202       0.783363        0.109824        -0.436342    0.158667 0.475748        -1.227730

回答1:


An awk solution: store the 2nd file in memory, then loop over the first file, emitting matching lines from the 2nd file:

awk 'FNR==NR {x2[$1] = $0; next} $1 in x2 {print x2[$1]}' second first

Implementing @Barmar's comment

join -1 2 -o "1.1 1.2 2.2 2.3" <(cat -n first | sort -k2) <(sort second) | 
sort -n | 
cut -d ' ' -f 2-

note to other answerers, I tested with these files:

$ cat first
foo x y
bar x y
baz x y
$ cat second
bar x1 y1
baz x2 y2
foo x3 y3

Explanation of

awk 'FNR==NR {x2[$1] = $0; next} $1 in x2 {print x2[$1]}' second first

This part reads the 1st file in the command line paramters (here, "second"):

FNR==NR {x2[$1] = $0; next}

The condition FNR == NR will be true only for the first named file. FNR is awk's "File Record Number" variable, NR is the current record number from all input sources. The current line is stored in an associative array named x2 (not a great variable name) indexed by the first field of the record.

The next condition, $1 in x2, will only start after the file "second" has been completely read. It will look at the first field of the line in file named "first", and the action prints the corresponding line from file "second", which has been stored in the array.

Note that the order of the files in the awk command is important. Since you control the output based on the file named "first", it must be the last file processed by awk.




回答2:


Use the paste command to merge lines of two files. For example:

file1:

f1_11   f1_12         
f1_21   f1_22         
f1_31   f1_32         
f1_41   f1_42     

file2:

f2_11   f2_12         
f2_21   f2_22         
f2_31   f2_32         
f2_41   f2_42

➜ ~ paste file1 file2

f1_11   f1_12           f2_11   f2_12         
f1_21   f1_22           f2_21   f2_22         
f1_31   f1_32           f2_31   f2_32         
f1_41   f1_42           f2_41   f2_42   

Now you can do a sort on column 1.

paste file1 file2 | sort -k1,1

Last but not least cut out the columns which belong to the second file, if you do not want to see the data of file1 in your final output:

paste file1 file2 | sort -k1,1 | cut -f4-6


来源:https://stackoverflow.com/questions/16284572/sort-a-file-based-on-a-column-in-another-file

标签
易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!