问题
I have two files both in the format of:
loc1 num1 num2
loc2 num3 num4
The first column is the location and I want to use the order of the locations in the first file to sort the second file so that I can put the two files together where the numbers are right for the location.
I can write a perl script to do this but I felt there might be some quick/easy shell/awk command to achieve this. Do you have any ideas?
Thanks.
Edits:
Here is the input, now I actually want to use column 2 in file 1 to sort file2.
File1:
GID location NAME GWEIGHT C1SI M1CO M1SI C1LY M1LY C1CO C1LI M1LI
AID ARRY2X ARRY1X ARRY3X ARRY4X ARRY5X ARRY0X ARRY6X ARRY7X
EWEIGHT 1.000000 1.000000 1.000000 1.000000 1.000000 1.000000 1.000000 1.000000
GENE735X chr17:66199278-66199496 chr17:66199278-66199496 1.000000 0.211785 -0.853890 1.071875 0.544136 0.703871 0.371880 0.218960 -2.268618
GENE1562X chr10:80097054-80097298 chr10:80097054-80097298 1.000000 0.533673 -0.397202 0.783363 0.109824 -0.436342 0.158667 0.475748 -1.227730
GENE6579X chr19:23694188-23694395 chr19:23694188-23694395 1.000000 0.127748 -0.203827 0.846738 0.045599 -0.211767 0.415442 0.282123 -1.302055
File 2:
GID location NAME GWEIGHT C1SI M1CO M1SI C1LY M1LY C1CO C1LI M1LI
AID ARRY2X ARRY1X ARRY3X ARRY4X ARRY5X ARRY0X ARRY6X ARRY7X
EWEIGHT 1.000000 1.000000 1.000000 1.000000 1.000000 1.000000 1.000000 1.000000
GENE6579X chr19:23694188-23694395 chr19:23694188-23694395 1.000000 0.127748 -0.203827 0.846738 0.045599 -0.211767 0.415442 0.282123 -1.302055
GENE735X chr17:66199278-66199496 chr17:66199278-66199496 1.000000 0.211785 -0.853890 1.071875 0.544136 0.703871 0.371880 0.218960 -2.268618
GENE1562X chr10:80097054-80097298 chr10:80097054-80097298 1.000000 0.533673 -0.397202 0.783363 0.109824 -0.436342 0.158667 0.475748 -1.227730
回答1:
An awk solution: store the 2nd file in memory, then loop over the first file, emitting matching lines from the 2nd file:
awk 'FNR==NR {x2[$1] = $0; next} $1 in x2 {print x2[$1]}' second first
Implementing @Barmar's comment
join -1 2 -o "1.1 1.2 2.2 2.3" <(cat -n first | sort -k2) <(sort second) |
sort -n |
cut -d ' ' -f 2-
note to other answerers, I tested with these files:
$ cat first
foo x y
bar x y
baz x y
$ cat second
bar x1 y1
baz x2 y2
foo x3 y3
Explanation of
awk 'FNR==NR {x2[$1] = $0; next} $1 in x2 {print x2[$1]}' second first
This part reads the 1st file in the command line paramters (here, "second"):
FNR==NR {x2[$1] = $0; next}
The condition FNR == NR
will be true only for the first named file. FNR
is awk's "File Record Number" variable, NR
is the current record number from all input sources. The current line is stored in an associative array named x2
(not a great variable name) indexed by the first field of the record.
The next condition, $1 in x2
, will only start after the file "second" has been completely read. It will look at the first field of the line in file named "first", and the action prints the corresponding line from file "second", which has been stored in the array.
Note that the order of the files in the awk command is important. Since you control the output based on the file named "first", it must be the last file processed by awk.
回答2:
Use the paste
command to merge lines of two files.
For example:
file1:
f1_11 f1_12
f1_21 f1_22
f1_31 f1_32
f1_41 f1_42
file2:
f2_11 f2_12
f2_21 f2_22
f2_31 f2_32
f2_41 f2_42
➜ ~ paste file1 file2
f1_11 f1_12 f2_11 f2_12
f1_21 f1_22 f2_21 f2_22
f1_31 f1_32 f2_31 f2_32
f1_41 f1_42 f2_41 f2_42
Now you can do a sort on column 1.
paste file1 file2 | sort -k1,1
Last but not least cut out the columns which belong to the second file, if you do not want to see the data of file1 in your final output:
paste file1 file2 | sort -k1,1 | cut -f4-6
来源:https://stackoverflow.com/questions/16284572/sort-a-file-based-on-a-column-in-another-file