Algorithm Complexity Time

后端 未结 1 575
自闭症患者
自闭症患者 2021-01-27 01:24

I am currently having trouble identifying and understanding the complexity time of the following algorithm.

Background: There is a list of files, each containing a list

1条回答
  •  执笔经年
    2021-01-27 01:47

    i'm just repeating what amit said, so please give him the upvote if that is clear to you - i find that explanation a bit confusing.

    your average complexity is O(n) where n is the total number of candidates (from all files). so if you have a files, each with b candidates then the time taken is proportional to a * b.

    this is because the simplest way to solve your problem is to simply loop through all the data, adding them to the set. the set will discard duplicates as necessary.

    looping over all values takes time proportional to the number of values (that is the O(n) part). adding a value to a hash set takes constant time (or O(1)). since that is constant time per entry, your overall time remains O(n).

    however, hash sets have a strange worst case behaviour - they take time proportional to the size of the contents in some (unusual) cases. so in the very worst case, each time you add a value it requires O(m) amount of work, where m is the number of entries in the set.

    now m is (approximately - it starts at zero and goes up to...) the number of distinct values. so we have two common cases:

    • if the number of distinct candidates increases as we read more (so, for example, 90% of the files are always new candidates) then m is proportional to n. that means that the work of adding each candidate increases proportional to n. so the total work is proportional to n^2 (since for each candidate we do work proportional to n, and there are n candidates). so the worst case is O(n^2).

    • if the number of distinct candidates is actually fixed, then as you read more and more files they tend to be just full of known candidates. in that case the extra work for inserting into the set is constant (you only get the strange behaviour a fixed number of times for the unique candidates - it doesn't depend on n). in that case the performance of the set does not keep getting worse as n gets larger and larger, so the worst case complexity remains O(n).

    0 讨论(0)
提交回复
热议问题