Advanced `uniq` with “unique part regex”

前端未结

关注

 3  1133

uniq is a tool that enables once to filter lines in a file such that only unique lines are shown. uniq has some support to specify when two lines a

相关标签:

3条回答

天命终不由人

2021-01-20 02:24
Here's a simple Perl script that will do the work:
```
#!/usr/bin/env perl
use strict;
use warnings;

my $re = qr($ARGV[0]);

my %matches;
while(<STDIN>) {
    next if $_ !~ $re;
    print if !$matches{$1};
    $matches{$1} = 1;
}
```
Usage:
```
$ ./uniq.pl '(!\w+!)' < file.dat
foo!bar!baz
!baz!quix
```
Here, I've used $1 to match on the first extracted group, but you can replace it with $& to use the whole pattern match.
This script will filter out lines that don't match the regex, but you can adjust it if you need a different behavior.
0 讨论(0)
发布评论:

提交评论
- 加载中...
情歌与酒

2021-01-20 02:28
Not using uniq but using gnu-awk you can get the results you want:
```
awk -v re='![[:alnum:]]+!' 'match($0, re, a) && !(a[0] in p) {p[a[0]]; print}' file
foo!bar!baz
!baz!quix
```
- Passing required regex using a command line variable -v re=...
- match function matches regex for each line and returns matched text in [a]
- Every time match succeeds we store matched text in an associative array p and print
- Thus effectively getting uniq function with regex support
0 讨论(0)
发布评论:

提交评论
- 加载中...

无人共我

2021-01-20 02:29

You can do this with just grep and sort

DATAFILE=file.dat

for match in $(grep -P '(!\w+!)' -o "$DATAFILE" | sort -u); do 
  grep -m1 "$match" "$DATAFILE";
done

Outputs:

foo!bar!baz
!baz!quix

0 讨论(0)