Why is the following two duplicate finder algorithms have different time complexity?

余生颓废 提交于 2019-12-11 03:48:31

问题


I was reading this question. The selected answer contains the following two algorithms. I couldn't understand why the first one's time complexity is O(ln(n)). At the worst case, if the array don't contain any duplicates it will loop n times so does the second one. Am I wrong or am I missing something? Thank you

1) A faster (in the limit) way

Here's a hash based approach. You gotta pay for the autoboxing, but it's O(ln(n)) instead of O(n2). An enterprising soul would go find a primitive int-based hash set (Apache or Google Collections has such a thing, methinks.)

boolean duplicates(final int[] zipcodelist)
{
  Set<Integer> lump = new HashSet<Integer>();
  for (int i : zipcodelist)
  {
    if (lump.contains(i)) return true;
    lump.add(i);
  }
  return false;
}

2)Bow to HuyLe

See HuyLe's answer for a more or less O(n) solution, which I think needs a couple of add'l steps:

static boolean duplicates(final int[] zipcodelist) {    
    final int MAXZIP = 99999;    
    boolean[] bitmap = new boolean[MAXZIP+1];    
    java.util.Arrays.fill(bitmap, false);    

    for (int item : zipcodeList)
        if (!bitmap[item]) bitmap[item] = true;
        else return true;    
    }

    return false; 
}

回答1:


The first solution should have expected complexity of O(n), since the whole zip code list must be traversed, and processing each zip code is O(1) expected time complexity.

Even taking into consideration that insertion into HashMap may trigger a re-hash, the complexity is still O(1). This is a bit of non sequitur, since there may be no relation between Java HashMap and the assumption in the link, but it is there to show that it is possible.

From HashSet documentation:

This class offers constant time performance for the basic operations (add, remove, contains and size), assuming the hash function disperses the elements properly among the buckets.

It's the same for the second solution, which is correctly analyzed: O(n).

(Just an off-topic note, BitSet is faster than array, as seen in the original post, since 8 booleans are packed into 1 byte, which uses less memory).



来源:https://stackoverflow.com/questions/11166247/why-is-the-following-two-duplicate-finder-algorithms-have-different-time-complex

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!