Counting unique element in large array

前端 未结 8 727
执念已碎
执念已碎 2021-02-03 15:33

One of my colleagues was asked this question in an interview.

Given a huge array which stores unsigned int. Length of array is 100000000. Find the effective

相关标签:
8条回答
  • 2021-02-03 15:49

    How about using a BloomFilter impl: like http://code.google.com/p/java-bloomfilter/ first do a bloom.contains(element) if true continue if false bloom.add(element).

    At the end count the number of elements added. Bloomfilter needs approx. 250mb memory to store 100000000 elements at 10bits per element.

    Problem is that false positives are possible in BloomFilters and can only be minimized by increasing the number of bits per element. This could be addressed by two BloomFilters with different hashing that need to agree.

    0 讨论(0)
  • 2021-02-03 15:53

    Sort it, then scan it from the beginning to determine the counts for each item.

    This approach requires no additional storage, and can be done in O(n log n) time (for the sort).

    0 讨论(0)
  • 2021-02-03 15:57

    Hashing in this case is not inneficient. The cost will be approximately O(N) (O(N) for iterating over the array and ~O(N) for iterating over the hashtable). Since you need O(N) for checking each element, the complexity is good.

    0 讨论(0)
  • 2021-02-03 16:02

    Many other posters have suggested sorting the data and then finding the number of adjacent values, but no one has mentioned using radix sort yet to get the runtime to be O(n lg U) (where U is the maximum value in the array) instead of O(n lg n). Since lg U = O(lg n), assuming that integers take up one machine word, this approach is asymptotically faster than heapsort.

    Non-comparison sorts are always fun in interviews. :-)

    0 讨论(0)
  • 2021-02-03 16:05

    Look at its variation that might help you to find no. of distinct elements.

    #include <bits/stdc++.h>
    using namespace std;
    
    #define ll long long int
    #define ump unordered_map
    
    void file_i_o()
    {
    ios_base::sync_with_stdio(0); 
    cin.tie(0); 
    cout.tie(0);
    #ifndef ONLINE_JUDGE
        freopen("input.txt", "r", stdin);
        freopen("output.txt", "w", stdout);
    #endif
    }
    
    int main() {
    file_i_o();
    ll t;
    cin>>t;
    while(t--)
    {
        int n,q;
        cin>>n>>q;
        ump<int,int> num;
        int x;
        int arr[n+1];
        int a,b;
        for(int i=1;i<=n;i++)
        {
            cin>>x;
            arr[i]=x;
            num[x]++;
        }
        for(int i=0;i<q;i++)
        {
            cin>>a>>b;
            num[arr[a]]--;
            if((num[arr[a]])==0)
            { num.erase(arr[a]); }
            arr[a]=b;
            num[b]++;
            cout<<num.size()<<"\n";
    
        }
    }
    return 0;
    }
    
    0 讨论(0)
  • 2021-02-03 16:07

    If the range of the int values is limited, then you may allocate an array, which serves to count the occurrences for each possible value. Then you just iterate through your huge array and increment the counters.

    foreach x in huge_array {
       counter[x]++;
    }
    

    Thus you find the solution in linear time (O(n)), but at the expense of memory consumption. That is, if your ints span the whole range allowed by 32-bit ints, you would need to allocate an array of 4G ints, which is impractical...

    0 讨论(0)
提交回复
热议问题