Algorithm to tell if two arrays have identical members

前端未结

关注

 16  2914

渐次进展

What\'s the best algorithm for comparing two arrays to see if they have the same members?

Assume there are no duplicates, the members can be in any order, and that n

相关标签:

16条回答

滥情空心

2020-11-29 06:59

Upon collisions a hashmap is O(n) in most cases because it uses a linked list to store the collisions. However, there are better approaches and you should hardly have collisions anyway because if you did the hashmap would be useless. In all regular cases it's simply O(1). Besides that, it's not likely to have more than a small n of collisions in a single hashmap so performance wouldn't suck that bad; you can safely say that it's O(1) or almost O(1) because the n is so small it's can be ignored.

0 讨论(0)
发布评论:

提交评论
- 加载中...
被撕碎了的回忆

2020-11-29 07:02
You can use a signature (a commutative operation over the array members) to further optimize this in the case where the array are usually different, saving the o(n log n) or the memory allocation. A signature can be of the form of a bloom filter(s), or even a simple commutative operation like addition or xor.

A simple example (assuming a long as the signature side and gethashcode as a good object identifier; if the objects are, say, ints, then their value is a better identifier; and some signatures will be larger than long)
```
public bool MatchArrays(object[] array1, object[] array2)
{
   if (array1.length != array2.length)
      return false;
   long signature1 = 0;
   long signature2 = 0;
   for (i=0;i<array1.length;i++) {
       signature1=CommutativeOperation(signature1,array1[i].getHashCode());
       signature2=CommutativeOperation(signature2,array2[i].getHashCode());
   }

   if (signature1 != signature2) 
       return false;

   return MatchArraysTheLongWay(array1, array2);
}
```
where (using an addition operation; use a different commutative operation if desired, e.g. bloom filters)
```
public long CommutativeOperation(long oldValue, long newElement) {
    return oldValue + newElement;
}
```
0 讨论(0)
发布评论:

提交评论
- 加载中...

爱一瞬间的悲伤

2020-11-29 07:03

Pseudocode :

A:array
B:array
C:hashtable

if A.length != B.length then return false;

foreach objA in A
{
H = objA;
if H is not found in C.Keys then
C.add(H as key,1 as initial value);
else
C.Val[H as key]++;
}

foreach objB in B
{
H = objB;
if H is not found in C.Keys then
return false;
else
C.Val[H as key]--;
}

if(C contains non-zero value)
return false;
else
return true;

0 讨论(0)

挽巷

2020-11-29 07:04
Assuming you don't want to disturb the original arrays and space is a consideration, another O(n.log(n)) solution that uses less space than sorting both arrays is:
1. Return FALSE if arrays differ in size
2. Sort the first array -- O(n.log(n)) time, extra space required is the size of one array
3. For each element in the 2nd array, check if it's in the sorted copy of the first array using a binary search -- O(n.log(n)) time
If you use this approach, please use a library routine to do the binary search. Binary search is surprisingly error-prone to hand-code.

[Added after reviewing solutions suggesting dictionary/set/hash lookups:]

In practice I'd use a hash. Several people have asserted O(1) behaviour for hashes, leading them to conclude a hash-based solution is O(N). Typical inserts/lookups may be close to O(1), and some hashing schemes guarantee worst-case O(1) lookup, but worst-case insertion -- in constructing the hash -- isn't O(1). Given any particular hashing data structure, there would be some set of inputs which would produce pathological behaviour. I suspect there exist hashing data structures with the combined worst-case to [insert-N-elements then lookup-N-elements] of O(N.log(N)) time and O(N) space.
0 讨论(0)
发布评论:

提交评论
- 加载中...
有刺的猬

2020-11-29 07:04
The best way is probably to use hashmaps. Since insertion into a hashmap is O(1), building a hashmap from one array should take O(n). You then have n lookups, which each take O(1), so another O(n) operation. All in all, it's O(n).

In python:
```
def comparray(a, b): 
    sa = set(a)
    return len(sa)==len(b) and all(el in sa for el in b)
```
0 讨论(0)
发布评论:

提交评论
- 加载中...

我寻月下人不归

2020-11-29 07:09

Ignoring the built in ways to do this in C#, you could do something like this:

Its O(1) in the best case, O(N) (per list) in worst case.

public bool MatchArrays(object[] array1, object[] array2)
{
   if (array1.length != array2.length)
      return false;

   bool retValue = true;

   HashTable ht = new HashTable();

   for (int i = 0; i < array1.length; i++)
   {
      ht.Add(array1[i]);
   }

   for (int i = 0; i < array2.length; i++)
   {
      if (ht.Contains(array2[i])
      {
         retValue = false;
         break;
      }
   }

    return retValue;
}

0 讨论(0)