发表新帖

发表新帖

What's the most efficient way to check for duplicates in an array of data using Perl?

前端未结

关注

 7  1531

I need to see if there are duplicates in an array of strings, what\'s the most time-efficient way of doing it?

相关标签:

7条回答

情歌与酒

2020-12-05 15:20
Create a hash or a set or use a collections.Counter().

As you encounter each string/input check to see if there's an instance of that in the hash. If so, it's a duplicate (do whatever you want about those). Otherwise add a value (such as, oh, say, the numeral one) to the hash, using the string as the key.

Example (using Python collections.Counter):
```
#!python
import collections
counts = collections.Counter(mylist)
uniq = [i for i,c in counts.iteritems() if c==1]
dupes = [i for i, c in counts.iteritems() if c>1]
```
These Counters are built around dictionaries (Pythons name for hashed mapping collections).

This is time efficient because hash keys are indexed. In most cases the lookup and insertion time for keys is done in near constant time. (In fact Perl "hashes" are so-called because they are implemented using an algorithmic trick called "hashing" --- a sort of checksum chosen for its extremely low probability of collision when fed arbitrary inputs).

If you initialize values to integers, starting with 1, then you can increment each value as you find its key already in the hash. This is just about the most efficient general purpose means of counting strings.
0 讨论(0)
发布评论:

提交评论
- 加载中...

上一页 1 2

热议问题