What's the most efficient way to check for duplicates in an array of data using Perl?

前端未结

关注

 7  1530

花落未央

I need to see if there are duplicates in an array of strings, what\'s the most time-efficient way of doing it?

相关标签:

7条回答

礼貌的吻别

2020-12-05 14:55
similar to @Schwern's second solution, but checks for duplicates a little earlier from within the comparison function of sort:
```
use strict;
use warnings;

@_ = sort { print "dup = $a$/" if $a eq $b; $a cmp $b } @ARGV;
```
it won't be as fast as the hashing solutions, but it requires less memory and is pretty darn cute
0 讨论(0)
发布评论:

提交评论
- 加载中...
星月不相逢

2020-12-05 14:58

Please don't ask about the most time efficient way to do something unless you have some specific requirements, such as "I have to dedupe a list of 100,000 integers in under a second." Otherwise, you're worrying about how long something takes for no reason.

0 讨论(0)
发布评论:

提交评论
- 加载中...
北海茫月

2020-12-05 15:01
If you need the uniquified array anyway, it is fastest to use the heavily-optimized library List::MoreUtils, and then compare the result to the original:
```
use strict;
use warnings;
use List::MoreUtils 'uniq';

my @array = qw(1 1 2 3 fibonacci!);
my @array_uniq = uniq @array;
print ((scalar(@array) == scalar(@array_uniq)) ? "no dupes" : "dupes") . " found!\n";
```
Or if the list is large and you want to bail as soon as a duplicate entry is found, use a hash:
```
my %uniq_elements;
foreach my $element (@array)
{
    die "dupe found!" if $uniq_elements{$element}++;
}
```
0 讨论(0)
发布评论:

提交评论
- 加载中...
鱼传尺愫

2020-12-05 15:04
One of the things I love about Perl is it's ability to almost read like English. It just sort of makes sense.
```
use strict;
use warnings;

my @array = qw/yes no maybe true false false perhaps no/;

my %seen;

foreach my $string (@array) {

    next unless $seen{$string}++;
    print "'$string' is duplicated.\n";
}
```
Output

'false' is duplicated.

'no' is duplicated.
0 讨论(0)
发布评论:

提交评论
- 加载中...

一整个雨季

2020-12-05 15:08

Not a direct answer, but this will return an array without duplicates:

#!/usr/bin/perl

use strict;
use warnings;

my @arr = ('a','a','a','b','b','c');
my %count;
my @arr_no_dups = grep { !$count{$_}++ } @arr;

print @arr_no_dups, "\n";

0 讨论(0)

夕颜

2020-12-05 15:15
Turning the array into a hash is the fastest way [O(n)], though its memory inefficient. Using a for loop is a bit faster than grep, but I'm not sure why.
```
#!/usr/bin/perl

use strict;
use warnings;

my %count;
my %dups;
for(@array) {
    $dups{$_}++ if $count{$_}++;
}
```
A memory efficient way is to sort the array in place and iterate through it looking for equal and adjacent entries.
```
# not exactly sort in place, but Perl does a decent job optimizing it
@array = sort @array;

my $last;
my %dups;
for my $entry (@array) {
    $dups{$entry}++ if defined $last and $entry eq $last;
    $last = $entry;
}
```
This is nlogn speed, because of the sort, but only needs to store the duplicates rather than a second copy of the data in %count. Worst case memory usage is still O(n) (when everything is duplicated) but if your array is large and there's not a lot of duplicates you'll win.

Theory aside, benchmarking shows the latter starts to lose on large arrays (like over a million) with a high percentage of duplicates.
0 讨论(0)
发布评论:

提交评论
- 加载中...

1 2 下一页

What's the most efficient way to check for duplicates in an array of data using Perl?

Output