Splitting one large file into many if the text in a column doesn't match the text in the one before it

后端 未结 4 1005
无人及你
无人及你 2021-01-26 06:31

I searched for awhile and couldn\'t find a response to this. I have a standard tsv file with the following format:

1    100    101    350    A
1    101    102            


        
相关标签:
4条回答
  • 2021-01-26 06:40

    So using grep, something like:

    for L in `grep -oE '[A-Z]+$'|uniq|sort|uniq`
    do
    grep -E ${L}'$' > file.${L}.txt
    done 
    

    The phrase grep -oE '[A-Z]+$'|uniq|sort|uniq should find all the unique keys, which you then use to re-parse the file multiple times. The sequence uniq|sort|uniq is to reduce the input to sort.

    If you really need to do it in a single pass, then you could process each line, and append it immediately to the appropriate output file.

    0 讨论(0)
  • 2021-01-26 06:45

    Here is a small solution in python using groupby and str.rpartition:

    from itertools import groupby
    
    with open("in_file.txt") as f_in:
    for name,lines in groupby(f_in.readlines(),key=lambda x:x.rpartition(" ")[2].strip()):
            with open(f"out_{name}.txt","w") as f_out:
                f_out.writelines(lines)
    
    0 讨论(0)
  • 2021-01-26 06:50

    So the scripting single-pass low memory line-by-line approach:

    while IFS=" " read -r value1 value2 value3 value4 value5 remainder
    do
      echo $value1 $value2 $value3 $value4 $value5 $remainder >> output.${value5}.txt
    done < "input.txt"
    

    Of course, you need to ensure there are no pre-existing output files, but that can be achieved a number of ways efficiently.

    0 讨论(0)
  • 2021-01-26 06:54

    With awk

    awk '{out = "File" $NF ".txt"; print >> out; close(out)}' file
    

    More efficient, not closing the destination file after every line:

    awk '
        $NF != dest {if (out) close(out); dest = $NF; out = "File" dest ".txt"} 
        {print >> out}
    ' file
    
    0 讨论(0)
提交回复
热议问题