How to close file in awk while generating a list of?

问题

Guys I'm trying to find a way to don't have the awk error "too many open file" . Here's my situation:

INPUT : ASCII file, lot of line, with this scheme:

NODE_212_lenght.._1
NODE_212_lenght.._2
NODE_213_lenght.._1
NODE_213_lenght.._2

In order to split this file with every record with the same NODE number, I've used this one-liner awk command

awk -F "_" '{print >("orfs_for_node_" $2 "")}' <file

With a file composed by lots of lines, this command keeps sayin "too many open files" . I've tried also by splitting by 2k lines, same. I can't actually go under 2k lines, because the input one is a huge file.

I know awk could close a file after doing something inside, but I don't know actually how to do that. I've tried adding

awk -F "_" '{print >("orfs_for_node_" $2 ""); close(orfs_for_node_*)}' <file

but this will make no output.

回答1:

If you switch to GNU awk that'll handle it for you. Otherwise this is the right syntax if your input file has all the lines for each $2 value grouped together:

awk -F '_' '{out="orfs_for_node_"$2} out!=prev{close(prev)} {print > out; prev=out}' file

otherwise you need to use >> instead of >:

awk -F '_' '{out="orfs_for_node_"$2} out!=prev{close(prev)} {print >> out; prev=out}' file

Note that in that second case you'd need to empty any pre-existing "out" files (e.g. from a previous run) before running it since it'll always append to the output files.

回答2:

From my understanding, you are looking for the right moment to close the file. For your example input content, you can do :

awk -F "_" 'BEGIN{prefix="orfs_for_node_"} 
NR>1&&$2!=last{close(prefix""last)}{last=$2;print >(prefix$2)}' inputFile

It checks the $2 if it changed, then close the file with last $2. This assumes that the lines in your file are sorted by $2

If it is not sorted by $2 use >>

来源：https://stackoverflow.com/questions/51209508/how-to-close-file-in-awk-while-generating-a-list-of

标签

bash

text

awk