One of my friend was asked this question in an interview -
Here is a mathematical approach, inspired by Kevin's answer and its comments.
Let's call the arrays A
and B
and let their unique elements be a
and b
, respectively. First, take the sums of both arrays and subtract one from the other; since everything else cancels, sum(A) - sum(B) = a - b = s
. Then, multiply the elements of both arrays and divide one by the other. Again, things cancel, so mult(A) / mult(B) = a / b = r
. Now, from these, we get a = rb
, so rb - b = s
or b = s / (r - 1)
and then a = rs / (r - 1)
.
I'm calling this mathematical because multiplying things out might not be a reasonable thing to do in a real program. The key is to have two different operations that both individually allow the canceling behavior and so that one distributes over the other. This latter property is used when going from rb - b = s
to b = s / (r - 1)
, and that won't work, say, with addition and XOR, which was my first attempt.
Here's another possibility. Unlike my previous answer it doesn't modify the arrays passed in, and should have a lower big-O bound (O(n) instead of O(n^2) - assuming constant time hashtable lookups), but will take up significantly more memory.
function findUnique(a:Array, b:Array):Array {
var aHash:Hashtable = buildHash(a);
var bHash:Hashtable = buildHash(b);
var uniqueFromA:int;
var uniqueFromB:int;
for each(value:int in a) {
if(!bHash.contains(value)) {
uniqueFromA = value;
break;
} else {
/* Not necessary, but will speed up the 2nd for-loop by removing
* values we know are duplicates. */
bHash.remove(value);
}
}
for each(value:int in b) {
if(!aHash.contains(value)) {
uniqueFromB = value;
break;
}
}
return [uniqueFromA, uniqueFromB];
}
function buildHash(a:Array):Hashtable {
var h:Hashtable = new Hashtable();
for each(value:int in a) {
h[value] = true;
}
return h;
}
In LINQ:
var unique1 = (from a in arrayA where !arrayB.Contains(a) select a).First();
var unique2 = (from b in arrayB where !arrayA.Contains(b) select b).First();
return new Pair(unique1, unique2);
...
public sealed class Pair<T0, T1>
{
public T0 Item1 {get;set;}
public T1 Item2 {get;set;}
public Pair(T0 item1, T1 item2)
{
Item1 = item1;
Item2 = item2;
}
//plus GetHashCode, equality etc.
}
The logic behind almost all of the previous answers is always the same: use set operations from mathematics to solve the problem.
A set in mathematics can contain each element only once. So the following list can’t be a set in the mathematical sense, since it contains one number (3) twice:
{ 1, 2, 3, 4, 3, 5 }
Since set operations, in particular checking whether an element already exists in a set, are common operations, most languages have data structures that implement these set operations efficiently. So we can simply fall back to this in our solution:
// Construct set from first list:
Set uniques = Set.from(list1);
// Iterate over second list, check each item’s existence in set.
for each (item in list2)
if (not uniques.Contains(item))
return item;
Different implementations of sets yield different performance, but this performance will always be superior to the naive solution (for large lists). In particular, two implementations exist:
In every case, the usage remains the same and the above pseudo-code gives a textbook solution to your problem. A Java implementation might look as follows:
// Construct set from first list:
Set<Integer> uniques = new HashSet<Integer>(list1);
// Iterate over second list, check each item’s existence in set.
for (int item : list2)
if (! uniques.Contains(item))
return item;
Notice how this looks almost exactly like the pseudo-code. Solutions in C#, C++ or other languages wouldn’t be much different.
EDIT Oops, I’ve just noticed that the requested return value is the pair of mismatching elements. However, this requirement doesn’t change the reasoning and almost doesn’t change the pseudo-code (do the same thing, with interchanged lists).
Given two arrays say A1 of size 'n' and A2 of size 'n-1', both the arrays have same element except one which we have to find.
Note: elements in A1 can be repeated.
Example:
A1:{2,5,5,3}
A2:{2,5,3}
Output: 5
A1:{1,2,3,3,3}
A2:{2,3,1,3}
Output: 3
public static void main(String args[])
{
int[] a ={1,2,3,3,3};
int[] b ={2,3,1,3};
int flag=1;
int num=0;
List<Integer> lst = new ArrayList<>(b.length);
for(int i : b)
lst.add(Integer.valueOf(i));
for(int i=0;i<a.length;i++)
{
flag=1;
for(int j=0;j<lst.size();j++)
{
if(a[i] == lst.get(j)){
lst.remove(j);
flag=0;
break;
}
}
if(flag == 1)
num=a[i];
}
System.out.println(num);
}
This can be solved quickly from just the sum and the sum of the squares of the two sequences. And calculating these sums will certainly be faster than the hashes that are suggested, and doesn't involve any comparisons between the sequence items.
Here's how to do it: If the two sets are {ai} and {bi}, then call A and B their sums, and A2 and B2 are the sum of the squares, i.e. A2 = Sum({ai2}), and for convenience, D=A-B, and D2=A2-B2. Therefore, D=a-b and D2=a2-b2, where a and b are the two elements that are different, and from this we see
a = (D2+D2)/(2*D)
b = a - D
This works out because, from algebra, a2-b2=(a+b)(a-b) or D2=(a+b)D, so a+b=D2/D, and since we also know a-b, we can find a and b.
An example in Python may be more convincing
a, b = 5, 22 # the initial unmatched terms
x, y = range(15), range(15)
y[a] = b
print "x =", x # x = [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14]
print "y =", y # y = [0, 1, 2, 3, 4, 22, 6, 7, 8, 9, 10, 11, 12, 13, 14]
D = sum(x) - sum(y)
D2 = sum([i**2 for i in x]) - sum([i**2 for i in y]) #element-wise squaring
a = (D2+D*D)/(2*D)
b = a - D
print "a=%i, b=%i" % (a, b)
#prints a=5, b=22 which is correct
(Of course, this is somewhat similar to jk's answer, except it doesn't require the multiplication of all the terms and the huge numbers that would result, but thanks to jk for the idea of a mathematical approach.)