uniq
is a tool that enables once to filter lines in a file such that only unique lines are shown. uniq
has some support to specify when two lines a
Here's a simple Perl script that will do the work:
#!/usr/bin/env perl
use strict;
use warnings;
my $re = qr($ARGV[0]);
my %matches;
while(<STDIN>) {
next if $_ !~ $re;
print if !$matches{$1};
$matches{$1} = 1;
}
Usage:
$ ./uniq.pl '(!\w+!)' < file.dat
foo!bar!baz
!baz!quix
Here, I've used $1
to match on the first extracted group, but you can replace it with $&
to use the whole pattern match.
This script will filter out lines that don't match the regex, but you can adjust it if you need a different behavior.
Not using uniq
but using gnu-awk you can get the results you want:
awk -v re='![[:alnum:]]+!' 'match($0, re, a) && !(a[0] in p) {p[a[0]]; print}' file
foo!bar!baz
!baz!quix
-v re=...
match
function matches regex for each line and returns matched text in [a]
match
succeeds we store matched text in an associative array p
and printuniq
function with regex
supportYou can do this with just grep
and sort
DATAFILE=file.dat
for match in $(grep -P '(!\w+!)' -o "$DATAFILE" | sort -u); do
grep -m1 "$match" "$DATAFILE";
done
Outputs:
foo!bar!baz
!baz!quix