问题
Guys I'm trying to find a way to don't have the awk error "too many open file" . Here's my situation:
INPUT : ASCII file, lot of line, with this scheme:
NODE_212_lenght.._1
NODE_212_lenght.._2
NODE_213_lenght.._1
NODE_213_lenght.._2
In order to split this file with every record with the same NODE number, I've used this one-liner awk command
awk -F "_" '{print >("orfs_for_node_" $2 "")}' <file
With a file composed by lots of lines, this command keeps sayin "too many open files" . I've tried also by splitting by 2k lines, same. I can't actually go under 2k lines, because the input one is a huge file.
I know awk could close a file after doing something inside, but I don't know actually how to do that. I've tried adding
awk -F "_" '{print >("orfs_for_node_" $2 ""); close(orfs_for_node_*)}' <file
but this will make no output.
回答1:
If you switch to GNU awk that'll handle it for you. Otherwise this is the right syntax if your input file has all the lines for each $2 value grouped together:
awk -F '_' '{out="orfs_for_node_"$2} out!=prev{close(prev)} {print > out; prev=out}' file
otherwise you need to use >>
instead of >
:
awk -F '_' '{out="orfs_for_node_"$2} out!=prev{close(prev)} {print >> out; prev=out}' file
Note that in that second case you'd need to empty any pre-existing "out" files (e.g. from a previous run) before running it since it'll always append to the output files.
回答2:
From my understanding, you are looking for the right moment to close
the file. For your example input content, you can do :
awk -F "_" 'BEGIN{prefix="orfs_for_node_"}
NR>1&&$2!=last{close(prefix""last)}{last=$2;print >(prefix$2)}' inputFile
It checks the $2
if it changed, then close the file with last $2
. This assumes that the lines in your file are sorted by $2
If it is not sorted by $2
use >>
来源:https://stackoverflow.com/questions/51209508/how-to-close-file-in-awk-while-generating-a-list-of