bash looping and extracting of the fragment of txt file

后端 未结 3 1665
长情又很酷
长情又很酷 2021-01-23 13:26

I am dealing with the analysis of big number of dlg text files located within the workdir. Each file has a table (usually located in different positions of the log) in the follo

3条回答
  •  [愿得一人]
    2021-01-23 13:57

    You can use this one, expected to be fast enough. Extra lines in your files, besides the tables, are not expected to be a problem.

    grep "#$" *.dlg | sort -rk11 | awk '!seen[$1]++'
    

    grep fetches all the histogram lines which are then sorted in reverse order by last field, that means lines with most # on the top, and finally awk removes the duplicates. Note that when grep is parsing more than one file, it has -H by default to print the filenames at the beginning of the line, so if you test it for one file, use grep -H.

    Result should be like this:

    file1.dlg:   3 |     -5.47 |  17 |     -5.44 |   2 |##########
    file2.dlg:   3 |     -5.47 |  17 |     -5.44 |   2 |####
    file3.dlg:   3 |     -5.47 |  17 |     -5.44 |   2 |#######
    

    Here is a modification to get the first appearence in case of many equal max lines in a file:

    grep "#$" *.dlg | sort -k11 | tac | awk '!seen[$1]++'
    

    We replaced the reversed parameter in sort, with the 'tac' command which is reversing the file stream, so now for any equal lines, initial order is preserved.


    Second solution

    Here using only awk:

    awk -F"|" '/#$/ && $NF > max[FILENAME] {max[FILENAME]=$NF; row[FILENAME]=$0}
               END {for (i in row) print i ":" row[i]}' *.dlg
    

    Update: if you execute it from different directory and want to keep only the basename of every file, to remove the path prefix:

    awk -F"|" '/#$/ && $NF > max[FILENAME] {max[FILENAME]=$NF; row[FILENAME]=$0}
               END {for (i in row) {sub(".*/","",i); print i ":" row[i]}}'
    

提交回复
热议问题