bash looping and extracting of the fragment of txt file

后端未结

关注

 3  1665

长情又很酷 2021-01-23 13:26

I am dealing with the analysis of big number of dlg text files located within the workdir. Each file has a table (usually located in different positions of the log) in the follo

3条回答

[愿得一人] (楼主)

2021-01-23 13:57
You can use this one, expected to be fast enough. Extra lines in your files, besides the tables, are not expected to be a problem.
```
grep "#$" *.dlg | sort -rk11 | awk '!seen[$1]++'
```
grep fetches all the histogram lines which are then sorted in reverse order by last field, that means lines with most # on the top, and finally awk removes the duplicates. Note that when grep is parsing more than one file, it has -H by default to print the filenames at the beginning of the line, so if you test it for one file, use grep -H.

Result should be like this:
```
file1.dlg:   3 |     -5.47 |  17 |     -5.44 |   2 |##########
file2.dlg:   3 |     -5.47 |  17 |     -5.44 |   2 |####
file3.dlg:   3 |     -5.47 |  17 |     -5.44 |   2 |#######
```
Here is a modification to get the first appearence in case of many equal max lines in a file:
```
grep "#$" *.dlg | sort -k11 | tac | awk '!seen[$1]++'
```
We replaced the reversed parameter in sort, with the 'tac' command which is reversing the file stream, so now for any equal lines, initial order is preserved.

Second solution

Here using only awk:
```
awk -F"|" '/#$/ && $NF > max[FILENAME] {max[FILENAME]=$NF; row[FILENAME]=$0}
           END {for (i in row) print i ":" row[i]}' *.dlg
```
Update: if you execute it from different directory and want to keep only the basename of every file, to remove the path prefix:
```
awk -F"|" '/#$/ && $NF > max[FILENAME] {max[FILENAME]=$NF; row[FILENAME]=$0}
           END {for (i in row) {sub(".*/","",i); print i ":" row[i]}}'
```
0 讨论(0)

查看其它3个回答
发布评论:

提交评论
- 加载中...