Algorithm to tell if two arrays have identical members

前端 未结 16 2914
渐次进展
渐次进展 2020-11-29 06:52

What\'s the best algorithm for comparing two arrays to see if they have the same members?

Assume there are no duplicates, the members can be in any order, and that n

相关标签:
16条回答
  • 2020-11-29 06:59

    Upon collisions a hashmap is O(n) in most cases because it uses a linked list to store the collisions. However, there are better approaches and you should hardly have collisions anyway because if you did the hashmap would be useless. In all regular cases it's simply O(1). Besides that, it's not likely to have more than a small n of collisions in a single hashmap so performance wouldn't suck that bad; you can safely say that it's O(1) or almost O(1) because the n is so small it's can be ignored.

    0 讨论(0)
  • 2020-11-29 07:02

    You can use a signature (a commutative operation over the array members) to further optimize this in the case where the array are usually different, saving the o(n log n) or the memory allocation. A signature can be of the form of a bloom filter(s), or even a simple commutative operation like addition or xor.

    A simple example (assuming a long as the signature side and gethashcode as a good object identifier; if the objects are, say, ints, then their value is a better identifier; and some signatures will be larger than long)

    public bool MatchArrays(object[] array1, object[] array2)
    {
       if (array1.length != array2.length)
          return false;
       long signature1 = 0;
       long signature2 = 0;
       for (i=0;i<array1.length;i++) {
           signature1=CommutativeOperation(signature1,array1[i].getHashCode());
           signature2=CommutativeOperation(signature2,array2[i].getHashCode());
       }
    
       if (signature1 != signature2) 
           return false;
    
       return MatchArraysTheLongWay(array1, array2);
    }
    

    where (using an addition operation; use a different commutative operation if desired, e.g. bloom filters)

    public long CommutativeOperation(long oldValue, long newElement) {
        return oldValue + newElement;
    }
    
    0 讨论(0)
  • 2020-11-29 07:03

    Pseudocode :

    A:array
    B:array
    C:hashtable
    
    if A.length != B.length then return false;
    
    foreach objA in A
    {
    H = objA;
    if H is not found in C.Keys then
    C.add(H as key,1 as initial value);
    else
    C.Val[H as key]++;
    }
    
    foreach objB in B
    {
    H = objB;
    if H is not found in C.Keys then
    return false;
    else
    C.Val[H as key]--;
    }
    
    if(C contains non-zero value)
    return false;
    else
    return true;
    
    0 讨论(0)
  • 2020-11-29 07:04

    Assuming you don't want to disturb the original arrays and space is a consideration, another O(n.log(n)) solution that uses less space than sorting both arrays is:

    1. Return FALSE if arrays differ in size
    2. Sort the first array -- O(n.log(n)) time, extra space required is the size of one array
    3. For each element in the 2nd array, check if it's in the sorted copy of the first array using a binary search -- O(n.log(n)) time

    If you use this approach, please use a library routine to do the binary search. Binary search is surprisingly error-prone to hand-code.

    [Added after reviewing solutions suggesting dictionary/set/hash lookups:]

    In practice I'd use a hash. Several people have asserted O(1) behaviour for hashes, leading them to conclude a hash-based solution is O(N). Typical inserts/lookups may be close to O(1), and some hashing schemes guarantee worst-case O(1) lookup, but worst-case insertion -- in constructing the hash -- isn't O(1). Given any particular hashing data structure, there would be some set of inputs which would produce pathological behaviour. I suspect there exist hashing data structures with the combined worst-case to [insert-N-elements then lookup-N-elements] of O(N.log(N)) time and O(N) space.

    0 讨论(0)
  • 2020-11-29 07:04

    The best way is probably to use hashmaps. Since insertion into a hashmap is O(1), building a hashmap from one array should take O(n). You then have n lookups, which each take O(1), so another O(n) operation. All in all, it's O(n).

    In python:

    def comparray(a, b): 
        sa = set(a)
        return len(sa)==len(b) and all(el in sa for el in b)
    
    0 讨论(0)
  • 2020-11-29 07:09

    Ignoring the built in ways to do this in C#, you could do something like this:

    Its O(1) in the best case, O(N) (per list) in worst case.

    public bool MatchArrays(object[] array1, object[] array2)
    {
       if (array1.length != array2.length)
          return false;
    
       bool retValue = true;
    
       HashTable ht = new HashTable();
    
       for (int i = 0; i < array1.length; i++)
       {
          ht.Add(array1[i]);
       }
    
       for (int i = 0; i < array2.length; i++)
       {
          if (ht.Contains(array2[i])
          {
             retValue = false;
             break;
          }
       }
    
        return retValue;
    }
    
    0 讨论(0)
提交回复
热议问题