I searched for awhile and couldn\'t find a response to this. I have a standard tsv file with the following format:
1 100 101 350 A
1 101 102
So using grep, something like:
for L in `grep -oE '[A-Z]+$'|uniq|sort|uniq`
do
grep -E ${L}'$' > file.${L}.txt
done
The phrase grep -oE '[A-Z]+$'|uniq|sort|uniq
should find all the unique keys, which you then use to re-parse the file multiple times. The sequence uniq|sort|uniq is to reduce the input to sort.
If you really need to do it in a single pass, then you could process each line, and append it immediately to the appropriate output file.
Here is a small solution in python using groupby
and str.rpartition
:
from itertools import groupby
with open("in_file.txt") as f_in:
for name,lines in groupby(f_in.readlines(),key=lambda x:x.rpartition(" ")[2].strip()):
with open(f"out_{name}.txt","w") as f_out:
f_out.writelines(lines)
So the scripting single-pass low memory line-by-line approach:
while IFS=" " read -r value1 value2 value3 value4 value5 remainder
do
echo $value1 $value2 $value3 $value4 $value5 $remainder >> output.${value5}.txt
done < "input.txt"
Of course, you need to ensure there are no pre-existing output files, but that can be achieved a number of ways efficiently.
With awk
awk '{out = "File" $NF ".txt"; print >> out; close(out)}' file
More efficient, not closing the destination file after every line:
awk '
$NF != dest {if (out) close(out); dest = $NF; out = "File" dest ".txt"}
{print >> out}
' file