How to count distinct values in a list in linear time?

瘦欲@ 提交于 2019-11-29 14:01:50

Update: - distinct vs. unique


If you are looking for "unique" values (As in if you see an element "JASON" more than once, than it is no longer unique and should not be counted)

You can do that in linear time by using a HashMap ;)

(The generalized / language-agnostic idea is Hash table)

Each entry of a HashMap / Hash table is <KEY, VALUE> pair where the keys are unique (but no restrictions on their corresponding value in the pair)

Step 1:

Iterate through all elements in the list once: O(n)

  • For each element seen in the list, check to see if it's in the HashMap already O(1), amortized
    • If not, add it to the HashMap with the value of the element in the list as the KEY, and the number of times you've seen this value so far as the VALUE O(1)
    • If so, increment the number of times you've seen this KEY so far O(1)

Step2:

Iterate through the HashMap and count the KEYS with VALUE equal to exactly 1 (thus unique) O(n)

Analysis:

  • Runtime: O(n), amortized
  • Space: O(U), where U is the number of distinct values.

If, however, you are looking for "distinct" values (As in if you want to count how many different elements there are), use a HashSet instead of a HashMap / Hash table, and then simply query the size of the HashSet.

j_random_hacker

You can adapt this extremely cool O(n)-time and O(1)-space in-place algorithm for removing duplicates to the task of counting distinct values -- simply count the number of values equal to the sentinel value in a final O(n) pass, and subtract that from the size of the list.

Add every element of the list to a HashSet and then check the size (cardinality) of the HashSet, which is the number of distinct values in the list.

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!