I need to see if there are duplicates in an array of strings, what\'s the most time-efficient way of doing it?
similar to @Schwern's second solution, but checks for duplicates a little earlier from within the comparison function of sort
:
use strict;
use warnings;
@_ = sort { print "dup = $a$/" if $a eq $b; $a cmp $b } @ARGV;
it won't be as fast as the hashing solutions, but it requires less memory and is pretty darn cute
Please don't ask about the most time efficient way to do something unless you have some specific requirements, such as "I have to dedupe a list of 100,000 integers in under a second." Otherwise, you're worrying about how long something takes for no reason.
If you need the uniquified array anyway, it is fastest to use the heavily-optimized library List::MoreUtils, and then compare the result to the original:
use strict;
use warnings;
use List::MoreUtils 'uniq';
my @array = qw(1 1 2 3 fibonacci!);
my @array_uniq = uniq @array;
print ((scalar(@array) == scalar(@array_uniq)) ? "no dupes" : "dupes") . " found!\n";
Or if the list is large and you want to bail as soon as a duplicate entry is found, use a hash:
my %uniq_elements;
foreach my $element (@array)
{
die "dupe found!" if $uniq_elements{$element}++;
}
One of the things I love about Perl is it's ability to almost read like English. It just sort of makes sense.
use strict;
use warnings;
my @array = qw/yes no maybe true false false perhaps no/;
my %seen;
foreach my $string (@array) {
next unless $seen{$string}++;
print "'$string' is duplicated.\n";
}
'false' is duplicated.
'no' is duplicated.
Not a direct answer, but this will return an array without duplicates:
#!/usr/bin/perl
use strict;
use warnings;
my @arr = ('a','a','a','b','b','c');
my %count;
my @arr_no_dups = grep { !$count{$_}++ } @arr;
print @arr_no_dups, "\n";
Turning the array into a hash is the fastest way [O(n)
], though its memory inefficient. Using a for loop is a bit faster than grep, but I'm not sure why.
#!/usr/bin/perl
use strict;
use warnings;
my %count;
my %dups;
for(@array) {
$dups{$_}++ if $count{$_}++;
}
A memory efficient way is to sort the array in place and iterate through it looking for equal and adjacent entries.
# not exactly sort in place, but Perl does a decent job optimizing it
@array = sort @array;
my $last;
my %dups;
for my $entry (@array) {
$dups{$entry}++ if defined $last and $entry eq $last;
$last = $entry;
}
This is nlogn
speed, because of the sort, but only needs to store the duplicates rather than a second copy of the data in %count
. Worst case memory usage is still O(n)
(when everything is duplicated) but if your array is large and there's not a lot of duplicates you'll win.
Theory aside, benchmarking shows the latter starts to lose on large arrays (like over a million) with a high percentage of duplicates.