I'm trying to print duplicate lines from the filehandle, not remove them or anything else I see asked on other questions. I don't have enough experience with perl to be able to quickly do this, so I'm asking here. What's the way to do this?
Using the standard Perl shorthands:
my %seen;
while ( <> ) {
print if $seen{$_}++;
}
As a "one-liner":
perl -ne 'print if $seen{$_}++'
More data? This prints <file name>:<line number>:<line>
:
perl -ne 'print ( $ARGV eq "-" ? "" : "$ARGV:" ), "$.:$_" if $seen{$_}++'
Explanation of %seen
:
%seen
declares a hash. For each unique line in the input (which is coming fromwhile(<>)
in this case)$seen{$_}
will have a scalar slot in the hash named by the the text of the line (this is what$_
is doing in the has{}
braces).- Using the postfix increment operator (
x++
) we take the value for our expression, remembering to increment it after the expression. So, if we haven't "seen" the line$seen{$_}
is undefined--but when forced into an numeric "context" like this, it's taken as 0--and false. - Then it's incremented to 1.
So, when the while
begins to run, all lines are "zero" (if it helps you can think of the lines as "not %seen
") then, the first time we see a line, perl
takes the undefined value - which fails the if
- and increments the count at the scalar slot to 1. Thus, it is 1 for any future occurrences at which point it passes the if
condition and it printed.
Now as I said above, %seen
declares a hash, but with strict
turned off, any variable expression can be created on the spot. So the first time perl sees $seen{$_}
it knows that I'm looking for %seen
, it doesn't have it, so it creates it.
An added neat thing about this is that at the end, if you care to use it, you have a count of how many times each line was repeated.
try this
#!/usr/bin/perl -w
use strict;
use warnings;
my %duplicates;
while (<DATA>) {
print if !defined $duplicates{$_};
$duplicates{$_}++;
}
Prints dupes only once:
perl -ne "print if $seen{$_}++ == 1"
If you have a Unix-like system, you can use uniq
:
uniq -d foo
or
uniq -D foo
should do what you want. More information: man uniq.
来源:https://stackoverflow.com/questions/5884401/perl-find-duplicate-lines-in-file-or-array