Google Interview: Find all contiguous subsequence in a given array of integers, whose sum falls in the given range. Can we do better than O(n^2)?

前端 未结 7 482
误落风尘
误落风尘 2021-01-30 02:35

Given an array of Integers, and a range (low, high), find all contiguous subsequence in the array which have sum in the range.

Is there a solution b

相关标签:
7条回答
  • 2021-01-30 03:05

    If all integers are non-negative, then it can be done in O(max(size-of-input,size-of-output)) time. This is optimal.

    Here's the algorithm in C.

    void interview_question (int* a, int N, int lo, int hi)
    {
      int sum_bottom_low = 0, sum_bottom_high = 0,
          bottom_low = 0, bottom_high = 0,
          top = 0;
      int i;
    
      if (lo == 0) printf ("[0 0) ");
      while (top < N)
      {
        sum_bottom_low += a[top];
        sum_bottom_high += a[top];
        top++;
        while (sum_bottom_high >= lo && bottom_high <= top)
        {
          sum_bottom_high -= a[bottom_high++];
        }
        while (sum_bottom_low > hi && bottom_low <= bottom_high)
        {
          sum_bottom_low -= a[bottom_low++];
        }
        // print output
        for (i = bottom_low; i < bottom_high; ++i)
          printf ("[%d %d) ", i, top);
      }
      printf("\n");
    }
    

    Except for the last loop marked "print output", each operation is executed O(N) times; the last loop is executed once for each interval printed. If we only need to count the intervals and not print them, the entire algorithm becomes O(N).

    If negative numbers are allowed, then O(N^2) is hard to beat (might be impossible).

    0 讨论(0)
  • 2021-01-30 03:07

    O(n) time solution:

    You can extend the 'two pointer' idea for the 'exact' version of the problem. We will maintain variables a and b such that all intervals on the form xs[i,a), xs[i,a+1), ..., xs[i,b-1) have a sum in the sought after range [lo, hi].

    a, b = 0, 0
    for i in range(n):
        while a != (n+1) and sum(xs[i:a]) < lo:
            a += 1
        while b != (n+1) and sum(xs[i:b]) <= hi:
            b += 1
        for j in range(a, b):
            print(xs[i:j])
    

    This is actually O(n^2) because of the sum, but we can easily fix that by first calculating the prefix sums ps such that ps[i] = sum(xs[:i]). Then sum(xs[i:j]) is simply ps[j]-ps[i].

    Here is an example of running the above code on [2, 5, 1, 1, 2, 2, 3, 4, 8, 2] with [lo, hi] = [3, 6]:

    [5]
    [5, 1]
    [1, 1, 2]
    [1, 1, 2, 2]
    [1, 2]
    [1, 2, 2]
    [2, 2]
    [2, 3]
    [3]
    [4]
    

    This runs in time O(n + t), where t is the size of the output. As some have noticed, the output can be as large as t = n^2, namely if all contiguous subsequences are matched.

    If we allow writing the output in a compressed format (output pairs a,b of which all subsequences are contiguous) we can get a pure O(n) time algorithm.

    0 讨论(0)
  • 2021-01-30 03:08

    You should use a simple dynamic programming and binary search. To find the count:

        from bisect import bisect_left, bisect_right
    
        def solve(A, start, end):
            """
            O(n lg n) Binary Search
            Bound:
            f[i] - f[j] = start
            f[i] - f[j'] = end
            start < end
            f[j] > f[j']
    
            :param A: an integer array
            :param start: lower bound
            :param end: upper bound 
            :return:
            """
            n = len(A)
            cnt = 0
            f = [0 for _ in xrange(n+1)]
    
            for i in xrange(1, n+1):
                f[i] = f[i-1]+A[i-1]  # sum from left
    
            f.sort()
            for i in xrange(n+1):
                lo = bisect_left(f, f[i]-end, 0, i)
                hi = bisect_right(f, f[i]-start, 0, i)
                cnt += hi-lo
    
            return cnt
    

    https://github.com/algorhythms/LintCode/blob/master/Subarray%20Sum%20II.py

    To find the results rather the count, you just need another hash table to store the mapping from original (not sorted) f[i] -> list of indexes.

    Cheers.

    0 讨论(0)
  • 2021-01-30 03:21

    Here is way you can get O(nlogn) if there are only positive numbers :-

    1. Evaluate cumulative sum of array
    2. for i  find total sum[j] in (sum[i]+low,sum[i]+high) using binary search
    3. Total = Total + count
    4. do 3 to 5 for all i
    

    Time complexity:-

    Cumulative sum is O(N)
    Finding sums in range is O(logN) using binary search
    Total Time complexity is O(NlogN)
    
    0 讨论(0)
  • 2021-01-30 03:22

    O(NlogN) with simple data structures is sufficient.

    For contiguous subsequences, I think it means for subarrays.

    We maintain a prefix sum list, prefix[i] = sum for the first i elements. How to check if there exists a range rum between [low, high]? We can use binary search. So,

    prefix[0] = array[0]  
    for i in range(1, N) 
      prefix[i] = array[i] + prefix[i-1];
      idx1 = binarySearch(prefix, prefix[i] - low);
      if (idx1 < 0) idx1 = -1 - idx1;
      idx2 = binarySearch(prefix, prefix[i] - high);
      if (idx2 < 0) idx2 = -1 - idx2;
      // for any k between [idx1, idx2], range [k, i] is within range [low, high]
      insert(prefix, prefix[i])
    

    The only thing we need to care is we also need to insert new values, thus any array or linked list is NOT okay. We can use a TreeSet, or implement your own AVL trees, both binary search and insertion would be in O(logN).

    0 讨论(0)
  • 2021-01-30 03:24

    Starting from this problem: find all contiguous sub-sequences that sum to x. What we need is something similar.

    For every index i, we can calculate the sum of the segment from 0 to i, which is x. So, the problem now is we need to find from 0 to i - 1, how many segments have sum from (x - low) to (x - high), and it should be faster than O(n). So there are several data structures help you to do that in O(logn), which are Fenwick tree and Interval tree.

    So what we need to do is:

    • Iterating through all index from 0 to n (n is the size of the array).

    • At index ith, calculate, starting from 0 to ith index, the sum x, query the tree to get the total occurrences of numbers fall in the range (x - high, x - low).

    • Add x to the tree.

    So the time complexity will be O(n log n)

    0 讨论(0)
提交回复
热议问题