Given an array of integers find the number of all ordered pairs of elements in the array whose sum lies in a given range [a,b]
Here is an O(n^2) solution for the same <
The problem of counting the pairs that work can be done in sort time + O(N). This is faster than the solution that Ani gives, which is sort time + O(N log N). The idea goes like this. First you sort. You then run nearly the same single pass algorithm twice. You then can use the results of the two single pass algorithms to calculate the answer.
The first time we run the single pass algorithm, we will create a new array that lists the smallest index that can partner with that index to give a sum greater than a. Example:
a = 6
array = [-20, 1, 3, 4, 8, 11]
output = [6, 4, 2, 2, 1, 1]
So, the number at array index 1 is 1 (0 based indexing). The smallest number it can pair with to get over 6 is the eight, which is at index 4. Hence output[1] = 4. -20 can't pair with anything, so output[0] = 6 (out of bounds). Another example: output[4] = 1, because 8 (index 4) can pair with the 1 (index 1) or any number after it to sum more than 6.
What you need to do now is convince yourself that this is O(N). It is. The code is:
i, j = 0, 5
while i - j <= 0:
if array[i] + array[j] >= a:
output[j] = i
j -= 1
else:
output[i] = j + 1
i += 1
Just think of two pointers starting at the edges and working inwards. It's O(N). You now do the same thing, just with the condition b <= a:
while i-j <= 0:
if array[i] + array[j] <= b:
output2[i] = j
i += 1
else:
output2[j] = i-1
j-=1
In our example, this code gives you (array and b for reference):
b = 9
array = [-20, 1, 3, 4, 8, 11]
output2 = [5, 4, 3, 3, 1, 0]
But now, output and output2 contain all the information we need, because they contain the range of valid indices for pairings. output is the smallest index it can be paired with, output2 is the largest index it can be paired with. The difference + 1 is the number of pairings for that location. So for the first location (corresponding to -20), there are 5 - 6 + 1 = 0 pairings. For 1, there are 4-4 + 1 pairings, with the number at index 4 which is 8. Another subtlety, this algo counts self pairings, so if you don't want it, you have to subtract. E.g. 3 seems to contain 3-2 + 1 = 2 pairings, one at index 2 and one at index 3. Of course, 3 itself is at index 2, so one of those is the self pairing, the other is the pairing with 4. You just need to subtract one whenever the range of indices of output and output2 contain the index itself you're looking at. In code, you can write:
answer = [o2 - o + 1 - (o <= i <= o2) for i, (o, o2) in enumerate(zip(output, output2))]
Which yields:
answer = [0, 1, 1, 1, 1, 0]
Which sums to 4, corresponding to (1,8), (3,4), (4,3), (8, 1)
Anyhow, as you can see, this is sort + O(N), which is optimal.
Edit: asked for full implementation. Provided. For reference, the full code:
def count_ranged_pairs(x, a, b):
x.sort()
output = [0] * len(x)
output2 = [0] * len(x)
i, j = 0, len(x)-1
while i - j <= 0:
if x[i] + x[j] >= a:
output[j] = i
j -= 1
else:
output[i] = j + 1
i += 1
i, j = 0, len(x) - 1
while i-j <= 0:
if x[i] + x[j] <= b:
output2[i] = j
i += 1
else:
output2[j] = i-1
j -=1
answer = [o2 - o + 1 - (o <= i <= o2) for i, (o, o2) in enumerate(zip(output, output2))]
return sum(answer)/2
The time complexity is of course output-sensitive, but this is still superior to the existing algo:
O(nlogn) + O(k)
where k is the number of pairs that satisfy the condition.
Note: If you only need to count the number of pairs, you can do it in O(nlogn)
. Modify the above algorithm so [b - x] (or the next smaller element) is also searched for. This way, you can count the number of 'matches' each element has in O(logn)
simply from the indices of the first and last match. Then it's just a question of summing those up to get the final count. This way, the initial O(nlogn)
sorting step is dominant.
from itertools import ifilter, combinations
def countpairs2(array, a, b):
pairInRange = lambda x: sum(x) >= a and sum(x) <= b
filtered = ifilter(pairInRange, combinations(array, 2))
return sum([2 for x in filtered])
I think the Itertools library comes in quite handy. I also noticed you counted pairs twice, for example you counted (1, 3) and (3, 1) as two different combinations. If you don't want that, just change the 2 in the last line to a 1.
Note: The last could be changed to return len(list(filtered)) * 2
. This CAN be faster, but at the expense of using more RAM.
Rather than using the relational operators, we can simply check if the sum of array elements i and j are in the specified range.
def get_numOfPairs(array, start, stop):
num_of_pairs = 0
array_length = len(array)
for i in range(array_length):
for j in range(i+1, array_length):
if sum([array[i], array[j]]) in range(start, stop):
num_of_pairs += 1
return num_of_pairs
I have a solution(actually 2 solutions ;-)). Writing it in python:
def find_count(input_list, min, max):
count = 0
range_diff = max - min
for i in range(len(input_list)):
if input_list[i]*2 >= min and input_list[i]*2 <= max:
count += 1
for j in range(i+1, len(input_list)):
input_sum = input_list[i] + input_list[j]
if input_sum >= min and input_sum <= max:
count += 2
This will run nCr(n combinations) times to the max and gives you the required count. This will be better than sorting the list and then finding the pairs in a range. If the number of elements that fail the combination is greater as well as all the numbers are positive integers, we can improve the result a little better by adding a condition that checks the elements for,
Something like this:
# list_maximum is the maximum number of the list (i.e) max(input_list), if already known
def find_count(input_list, min, max, list_maximum):
count = 0
range_diff = max - min
for i in range(len(input_list)):
if input_list[i] > max or input_list[i] + list_maximum < min:
continue
if input_list[i]*2 >= min and input_list[i]*2 <= max:
count += 1
for j in range(i+1, len(input_list)):
input_sum = input_list[i] + input_list[j]
if input_sum >= min and input_sum <= max:
count += 2
I will also be happy to learn any better solution than this :-) If i come across one, I will update this answer.
With some constraints on the data we can solve problem in linear time (sorry for Java, I'm not very proficient with Python):
public class Program {
public static void main(String[] args) {
test(new int[]{-2, -1, 0, 1, 3, -3}, -1, 2);
test(new int[]{100,200,300}, 300, 300);
test(new int[]{100}, 1, 1000);
test(new int[]{-1, 0, 0, 0, 1, 1, 1000}, -1, 2);
}
public static int countPairs(int[] input, int a, int b) {
int min = Integer.MAX_VALUE;
int max = Integer.MIN_VALUE;
for (int el : input) {
max = Math.max(max, el);
min = Math.min(min, el);
}
int d = max - min + 1; // "Diameter" of the array
// Build naive hash-map of input: Map all elements to range [0; d]
int[] lookup = new int[d];
for (int el : input) {
lookup[el - min]++;
}
// a and b also needs to be adjusted
int a1 = a - min;
int b1 = b - min;
int[] counts = lookup; // Just rename
// i-th element contain count of lookup elements in range [0; i]
for (int i = 1; i < counts.length; ++i) {
counts[i] += counts[i - 1];
}
int res = 0;
for (int el : input) {
int lo = a1 - el; // el2 >= lo
int hi = b1 - el; // el2 <= hi
lo = Math.max(lo, 0);
hi = Math.min(hi, d - 1);
if (lo <= hi) {
res += counts[hi];
if (lo > 0) {
res -= counts[lo - 1];
}
}
// Exclude pair with same element
if (a <= 2*el && 2*el <= b) {
--res;
}
}
// Calculated pairs are ordered, divide by 2
return res / 2;
}
public static int naive(int[] ar, int a, int b) {
int res = 0;
for (int i = 0; i < ar.length; ++i) {
for (int j = i + 1; j < ar.length; ++j) {
int sum = ar[i] + ar[j];
if (a <= sum && sum <= b) {
++res;
}
}
}
return res;
}
private static void test(int[] input, int a, int b) {
int naiveSol = naive(input, a, b);
int optimizedSol = countPairs(input, a, b);
if (naiveSol != optimizedSol) {
System.out.println("Problem!!!");
}
}
}
For each element of the array we know the range in which second element of the pair can lay. Core of this algorithm is giving the count of elements in range [a; b] in O(1) time.
Resulting complexity is O(max(N, D)), where D is difference between max and min elements of the array. If this value is same order as N - complexity is O(N).
Notes:
if (a <= 2*el && 2*el <= b)
is required because algorithm always counts pairs (a[i],a[i])Another linear algorithm would be radix sort + linear pair counting.
EDIT. This algorithm can be really good in case if D is considerably smaller than N and you are not allowed to modify the input array. Alternative option for this case would be slightly modified counting sort with allocation of counts array (additional O(D) memory) but without populating sorted elements back to input array. It's possible to adapt pair counting to use counts array instead of full sorted array.