问题
I have the following two files:
sequences.txt
158333741 Acaryochloris_marina_MBIC11017_uid58167 158333741 432 1 432 COG0001 0
158339504 Acaryochloris_marina_MBIC11017_uid58167 158339504 491 1 491 COG0002 0
379012832 Acetobacterium_woodii_DSM_1030_uid88073 379012832 430 1 430 COG0001 0
302391336 Acetohalobium_arabaticum_DSM_5501_uid51423 302391336 441 1 441 COG0003 0
311103820 Achromobacter_xylosoxidans_A8_uid59899 311103820 425 1 425 COG0004 0
332795879 Acidianus_hospitalis_W1_uid66875 332795879 369 1 369 COG0005 0
332796307 Acidianus_hospitalis_W1_uid66875 332796307 416 1 416 COG0005 0
allids.txt
COG0001
COG0002
COG0003
COG0004
COG0005
Now I want to read each line in allids.txt
, search all lines in sequences.txt
(specifically in column 7), and write for each line
in allids.txt
a file with the filename $line
.
my approach is to use a simple grep:
while read line; do
grep "$line" sequences.txt
done <allids.txt
but where do I incorporate the command for the output? If there is a command that is faster, feel free to suggest!
My expected output:
COG0001.txt
158333741 Acaryochloris_marina_MBIC11017_uid58167 158333741 432 1 432 COG0001 0
379012832 Acetobacterium_woodii_DSM_1030_uid88073 379012832 430 1 430 COG0001 0
COG0002.txt
158339504 Acaryochloris_marina_MBIC11017_uid58167 158339504 491 1 491 COG0002 0
[and so on]
回答1:
I suspect all you really need is:
awk '{print > ($7".txt")}' sequences.txt
That suspicion is based on your IDs file being named allIds.txt
(note the all) and there being no IDs in sequences.txt
that don't exist in allIds.txt
.
回答2:
It is quite simple to do it using awk
:
awk 'NR==FNR{ids[$1]; next} $7 in ids{print > ($7 ".txt")}' allids.txt sequences.txt
Reference: Effective AWK Programming
回答3:
Extending your approach, this seemed to work:
while read line; do
# touching is not necessary as pointed out by @123
# touch "$line.txt"
grep "$line" sequences.txt > "$line.txt"
done <allids.txt
It produces text files with the required output. But I cannot comment on the efficiency of this approach.
EDIT:
As has been pointed out in the comments, this method is slow and would break for any file that violates the unsaid assumptions used in the answer. I'm leaving it here people to see how a quick and hacky solution could backfire.
来源:https://stackoverflow.com/questions/44682552/read-lines-from-a-file-grep-in-a-second-file-and-output-a-file-for-each-line