Remove a set of rows from a file separated by a white line having a specific key word

前端 未结 2 782
刺人心
刺人心 2021-01-17 07:07

I have a file containing lines as given below. I want to delete a set of rows from the file, if any line from a set of rows contains key word SEDS2-TOP. Each set of rows is

相关标签:
2条回答
  • 2021-01-17 07:34

    You can do it in awk using 3-rules and the END rule. It can be written as follows:

    awk 'NF==0 {              # empty line
        for (i in a)          # for each line in array a
            print i           # output line (index)
        if (i in a)           # if lines exists
            print ""          # output blank line at end
        delete a              # clear a array
        del=0                 # set delete group flag 0
        next                  # get next record
    }
    /SEDS2-TOP/ {             # SEDS2-TOP matched in record
        del=1                 # set delete group flag 1
        delete a              # delete array a
        next                  # get next records
    }
    del==0 {                  # del group flag is zero
        a[$0]++               # add line as index to array a
    }
    END {                     # END rule - process last group of lines
        if (del==0) {         # if del group flag not set
            for (i in a)      # loop over lines in a
                print i       # output line (index)
            print ""          # with newline after
        }
    }' rowsets
    

    Example Use/Output

    Using your data file as input, you can simply select-copy the script above (and change the filename containing the row-sets from rowsets to whatever you have, then middle-mouse paste into your terminal in the directory with the file, e.g.

    $ awk 'NF==0 {              # empty line
    >     for (i in a)          # for each line in array a
    >         print i           # output line (index)
    >     if (i in a)           # if lines exists
    >         print ""          # output blank line at end
    >     delete a              # clear a array
    >     del=0                 # set delete group flag 0
    >     next                  # get next record
    > }
    > /SEDS2-TOP/ {             # SEDS2-TOP matched in record
    >     del=1                 # set delete group flag 1
    >     delete a              # delete array a
    >     next                  # get next records
    > }
    > del==0 {                  # del group flag is zero
    >     a[$0]++               # add line as index to array a
    > }
    > END {                     # END rule - process last group of lines
    >     if (del==0) {         # if del group flag not set
    >         for (i in a)      # loop over lines in a
    >             print i       # output line (index)
    >         print ""          # with newline after
    >     }
    > }' rowsets
    0.00  600.00  1500.00     0.00 1.00000 WATER-BOTTOM
    0.00  600.00  2214.28   785.71 1.00000 SEDS1-BOTTOM
    0.00  600.00  2214.28   785.71 1.00000 SEDS1-TOP
    
    0.00  400.00  2004.28   785.71 1.00000 SEDS1-BOTTOM
    0.00  300.00  2254.28   785.71 1.00000 SEDS1-TOP
    0.00  600.00  1600.00     0.00 1.00000 WATER-BOTTOM
    

    Preserving Row Order

    If preserving the row-order is needed, then instead of using the line as the index, you can introduce a new counter variable to be used as the index that would correspond to the row number in the array. That allows you to output the rows in their original order, e.g.

    awk -v ndx=1 '
    NF==0 {                   # empty line
        for (i=1; i<ndx; i++) # for each line in array a
            print a[i]        # output line
        if (ndx > 1)          # if lines exists
            print ""          # output blank line at end
        delete a              # clear a array
        del=0                 # set delete group flag 0
        ndx=1                 # reset array index 1
        next                  # get next record
    }
    /SEDS2-TOP/ {             # SEDS2-TOP matched in record
        del=1                 # set delete group flag 1
        delete a              # delete array a
        ndx=1                 # reset array index 1
        next                  # get next records
    }
    del==0 {                  # del group flag is zero
        a[ndx++]=$0           # add line to array a
    }
    END {                     # END rule - process last group of lines
        if (del==0) {         # if del group flag not set
            for (i=1; i<ndx; i++)   # loop over lines in a
                print i       # output line (index)
            print ""          # with newline after
        }
    }' rowsets
    

    In that case, your output would be:

    0.00  600.00  2214.28   785.71 1.00000 SEDS1-BOTTOM
    0.00  600.00  2214.28   785.71 1.00000 SEDS1-TOP
    0.00  600.00  1500.00     0.00 1.00000 WATER-BOTTOM
    
    0.00  400.00  2004.28   785.71 1.00000 SEDS1-BOTTOM
    0.00  300.00  2254.28   785.71 1.00000 SEDS1-TOP
    0.00  600.00  1600.00     0.00 1.00000 WATER-BOTTOM
    

    Look things over and let me know if you have further questions.

    0 讨论(0)
  • 2021-01-17 07:37

    separated by a white line should lead you to paragraph mode.

    Perl:

    $ perl -00 -ne 'print if !/SEDS2-TOP/' sample.txt
    0.00  600.00  2214.28   785.71 1.00000 SEDS1-BOTTOM
    0.00  600.00  2214.28   785.71 1.00000 SEDS1-TOP
    0.00  600.00  1500.00     0.00 1.00000 WATER-BOTTOM
    
    0.00  400.00  2004.28   785.71 1.00000 SEDS1-BOTTOM
    0.00  300.00  2254.28   785.71 1.00000 SEDS1-TOP
    0.00  600.00  1600.00     0.00 1.00000 WATER-BOTTOM
    
    • -00 enable paragraph mode
    • -n don't print by default
    • print if !/SEDS2-TOP/ - print paragraph only if it doesn't match

    AWK variant:

    $ awk -v RS= -v ORS='\n\n' '!/SEDS2-TOP/' sample.txt
    
    0.00  600.00  2214.28   785.71 1.00000 SEDS1-BOTTOM
    0.00  600.00  2214.28   785.71 1.00000 SEDS1-TOP
    0.00  600.00  1500.00     0.00 1.00000 WATER-BOTTOM
    
    0.00  400.00  2004.28   785.71 1.00000 SEDS1-BOTTOM
    0.00  300.00  2254.28   785.71 1.00000 SEDS1-TOP
    0.00  600.00  1600.00     0.00 1.00000 WATER-BOTTOM
    
    • -v RS= - enable paragraph mode
    • -v ORS='\n\n'- separate output with one new line
    • !/SEDS2-TOP/ - print only if the paragraph doesn't match

    A cumbersome approach to "move" the matching records into a new file would be:

    perl -00 -i -ne 'if (!/SEDS2-TOP/) { print } else {print STDERR}' sample.txt 2>sample2.txt
    
    • -i modifies sample.txt in place
    • print STDERR - will print non matching lines into on STDERR
    • 2>sample2.txt - saves the STDERR into the new file.

    However, that requires in-place editing and not many textutils have that. Easiest approach is to create two new files, ones with the mathing records and one with non matching ones.

    awk -v RS= -v ORS='\n\n' '!/SEDS2-TOP/' sample.txt >not_maching.txt
    awk -v RS= -v ORS='\n\n' '/SEDS2-TOP/' sample.txt  >matching.txt
    
    0 讨论(0)
提交回复
热议问题