Counting unique element in large array

前端未结

关注

 8  727

执念已碎

One of my colleagues was asked this question in an interview.

Given a huge array which stores unsigned int. Length of array is 100000000. Find the effective

相关标签:

8条回答

佛祖请我去吃肉

2021-02-03 15:49

How about using a BloomFilter impl: like http://code.google.com/p/java-bloomfilter/ first do a bloom.contains(element) if true continue if false bloom.add(element).

At the end count the number of elements added. Bloomfilter needs approx. 250mb memory to store 100000000 elements at 10bits per element.

Problem is that false positives are possible in BloomFilters and can only be minimized by increasing the number of bits per element. This could be addressed by two BloomFilters with different hashing that need to agree.

0 讨论(0)
发布评论:

提交评论
- 加载中...
-上瘾入骨i

2021-02-03 15:53

Sort it, then scan it from the beginning to determine the counts for each item.

This approach requires no additional storage, and can be done in O(n log n) time (for the sort).

0 讨论(0)
发布评论:

提交评论
- 加载中...
栀梦

2021-02-03 15:57

Hashing in this case is not inneficient. The cost will be approximately O(N) (O(N) for iterating over the array and ~O(N) for iterating over the hashtable). Since you need O(N) for checking each element, the complexity is good.

0 讨论(0)
发布评论:

提交评论
- 加载中...
无人及你

2021-02-03 16:02

Many other posters have suggested sorting the data and then finding the number of adjacent values, but no one has mentioned using radix sort yet to get the runtime to be O(n lg U) (where U is the maximum value in the array) instead of O(n lg n). Since lg U = O(lg n), assuming that integers take up one machine word, this approach is asymptotically faster than heapsort.

Non-comparison sorts are always fun in interviews. :-)

0 讨论(0)
发布评论:

提交评论
- 加载中...

挽巷

2021-02-03 16:05

Look at its variation that might help you to find no. of distinct elements.

#include <bits/stdc++.h>
using namespace std;

#define ll long long int
#define ump unordered_map

void file_i_o()
{
ios_base::sync_with_stdio(0); 
cin.tie(0); 
cout.tie(0);
#ifndef ONLINE_JUDGE
    freopen("input.txt", "r", stdin);
    freopen("output.txt", "w", stdout);
#endif
}

int main() {
file_i_o();
ll t;
cin>>t;
while(t--)
{
    int n,q;
    cin>>n>>q;
    ump<int,int> num;
    int x;
    int arr[n+1];
    int a,b;
    for(int i=1;i<=n;i++)
    {
        cin>>x;
        arr[i]=x;
        num[x]++;
    }
    for(int i=0;i<q;i++)
    {
        cin>>a>>b;
        num[arr[a]]--;
        if((num[arr[a]])==0)
        { num.erase(arr[a]); }
        arr[a]=b;
        num[b]++;
        cout<<num.size()<<"\n";

    }
}
return 0;
}

0 讨论(0)

说谎

2021-02-03 16:07
If the range of the int values is limited, then you may allocate an array, which serves to count the occurrences for each possible value. Then you just iterate through your huge array and increment the counters.
```
foreach x in huge_array {
   counter[x]++;
}
```
Thus you find the solution in linear time (O(n)), but at the expense of memory consumption. That is, if your ints span the whole range allowed by 32-bit ints, you would need to allocate an array of 4G ints, which is impractical...
0 讨论(0)
发布评论:

提交评论
- 加载中...

1 2 下一页