What's the most efficient way to check for duplicates in an array of data using Perl?

前端 未结 7 1531
花落未央
花落未央 2020-12-05 14:26

I need to see if there are duplicates in an array of strings, what\'s the most time-efficient way of doing it?

相关标签:
7条回答
  • 2020-12-05 15:20

    Create a hash or a set or use a collections.Counter().

    As you encounter each string/input check to see if there's an instance of that in the hash. If so, it's a duplicate (do whatever you want about those). Otherwise add a value (such as, oh, say, the numeral one) to the hash, using the string as the key.

    Example (using Python collections.Counter):

    #!python
    import collections
    counts = collections.Counter(mylist)
    uniq = [i for i,c in counts.iteritems() if c==1]
    dupes = [i for i, c in counts.iteritems() if c>1]
    

    These Counters are built around dictionaries (Pythons name for hashed mapping collections).

    This is time efficient because hash keys are indexed. In most cases the lookup and insertion time for keys is done in near constant time. (In fact Perl "hashes" are so-called because they are implemented using an algorithmic trick called "hashing" --- a sort of checksum chosen for its extremely low probability of collision when fed arbitrary inputs).

    If you initialize values to integers, starting with 1, then you can increment each value as you find its key already in the hash. This is just about the most efficient general purpose means of counting strings.

    0 讨论(0)
提交回复
热议问题