Why is the following two duplicate finder algorithms have different time complexity?

问题

I was reading this question. The selected answer contains the following two algorithms. I couldn't understand why the first one's time complexity is O(ln(n)). At the worst case, if the array don't contain any duplicates it will loop n times so does the second one. Am I wrong or am I missing something? Thank you

1) A faster (in the limit) way

Here's a hash based approach. You gotta pay for the autoboxing, but it's O(ln(n)) instead of O(n2). An enterprising soul would go find a primitive int-based hash set (Apache or Google Collections has such a thing, methinks.)

boolean duplicates(final int[] zipcodelist)
{
  Set<Integer> lump = new HashSet<Integer>();
  for (int i : zipcodelist)
  {
    if (lump.contains(i)) return true;
    lump.add(i);
  }
  return false;
}

2)Bow to HuyLe

See HuyLe's answer for a more or less O(n) solution, which I think needs a couple of add'l steps:

static boolean duplicates(final int[] zipcodelist) {    
    final int MAXZIP = 99999;    
    boolean[] bitmap = new boolean[MAXZIP+1];    
    java.util.Arrays.fill(bitmap, false);    

    for (int item : zipcodeList)
        if (!bitmap[item]) bitmap[item] = true;
        else return true;    
    }

    return false; 
}

回答1:

The first solution should have expected complexity of O(n), since the whole zip code list must be traversed, and processing each zip code is O(1) expected time complexity.

Even taking into consideration that insertion into HashMap may trigger a re-hash, the complexity is still O(1). This is a bit of non sequitur, since there may be no relation between Java HashMap and the assumption in the link, but it is there to show that it is possible.

From HashSet documentation:

This class offers constant time performance for the basic operations (add, remove, contains and size), assuming the hash function disperses the elements properly among the buckets.

It's the same for the second solution, which is correctly analyzed: O(n).

(Just an off-topic note, BitSet is faster than array, as seen in the original post, since 8 booleans are packed into 1 byte, which uses less memory).

来源：https://stackoverflow.com/questions/11166247/why-is-the-following-two-duplicate-finder-algorithms-have-different-time-complex

标签

algorithm

time

time-complexity

duplicate-removal