Advanced `uniq` with “unique part regex”

前端 未结 3 1131
广开言路
广开言路 2021-01-20 02:16

uniq is a tool that enables once to filter lines in a file such that only unique lines are shown. uniq has some support to specify when two lines a

相关标签:
3条回答
  • 2021-01-20 02:24

    Here's a simple Perl script that will do the work:

    #!/usr/bin/env perl
    use strict;
    use warnings;
    
    my $re = qr($ARGV[0]);
    
    my %matches;
    while(<STDIN>) {
        next if $_ !~ $re;
        print if !$matches{$1};
        $matches{$1} = 1;
    }
    

    Usage:

    $ ./uniq.pl '(!\w+!)' < file.dat
    foo!bar!baz
    !baz!quix
    

    Here, I've used $1 to match on the first extracted group, but you can replace it with $& to use the whole pattern match.
    This script will filter out lines that don't match the regex, but you can adjust it if you need a different behavior.

    0 讨论(0)
  • 2021-01-20 02:28

    Not using uniq but using gnu-awk you can get the results you want:

    awk -v re='![[:alnum:]]+!' 'match($0, re, a) && !(a[0] in p) {p[a[0]]; print}' file
    foo!bar!baz
    !baz!quix
    
    • Passing required regex using a command line variable -v re=...
    • match function matches regex for each line and returns matched text in [a]
    • Every time match succeeds we store matched text in an associative array p and print
    • Thus effectively getting uniq function with regex support
    0 讨论(0)
  • 2021-01-20 02:29

    You can do this with just grep and sort

    DATAFILE=file.dat
    
    for match in $(grep -P '(!\w+!)' -o "$DATAFILE" | sort -u); do 
      grep -m1 "$match" "$DATAFILE";
    done
    

    Outputs:

    foo!bar!baz
    !baz!quix
    
    0 讨论(0)
提交回复
热议问题