Algorithm to determine indices i..j of array A containing all the elements of another array B

前端 未结 4 1290
野性不改
野性不改 2021-02-04 14:15

I came across this question on an interview questions thread. Here is the question:

Given two integer arrays A [1..n] and B[1..m], find the smallest<

相关标签:
4条回答
  • 2021-02-04 14:41

    Here is the solution I thought of (but it's not very neat).

    I am going to illustrate it using the example in the question.

    Let A[1,2,5,11,2,6,8,24,101,17,8] and B[5,2,11,8,17]

    1. Sort B. (So B = [2,5,8,11,17]). This step takes O(log m).

    2. Allocate an array C of size A. Iterate through elements of A, binary search for it in the sorted B, if it is found enter it's "index in sorted B + 1" in C. If its not found, enter -1. After this step,

    A = [1 , 2, 5, 11, 2, 6, 8, 24, 101, 17, 8] (no changes, quoting for ease).

    C = [-1, 1, 2, 4 , 1, -1, 3, -1, -1, 5, 3]

    Time: (n log m), Space O(n).

    1. Find the smallest window in C that has all the numbers from 1 to m. For finding the window, I can think of two general directions: a. A bit oriented approach where in I set the bit corresponding to each position and finally check by some kind of ANDing. b. Create another array D of size m, go through C and when I encounter p in C, increment D[p]. Use this for finding the window.

    Please leave comments regarding the general approach as such, as well as for 3a and 3b.

    0 讨论(0)
  • 2021-02-04 14:49
    struct Pair {
        int i;
        int j;
    };
    
    Pair
    find_smallest_subarray_window(int *A, size_t n, int *B, size_t m)
    {
        Pair p;
    
        p.i = -1;
        p.j = -1;
    
        // key is array value, value is array index
        std::map<int, int> map;
        size_t count = 0;
    
        int i;
        int j;
        for(i = 0; i < n, ++i) {
            for(j = 0; j < m; ++j) {
                if(A[i] == B[j]) {
                    if(map.find(A[i]) == map.end()) {
                        map.insert(std::pair<int, int>(A[i], i));
                    } else {
                        int start = findSmallestVal(map);
                        int end = findLargestVal(map);
                        int oldLength = end-start;
                        int oldIndex = map[A[i]];
    
                        map[A[i]] = i;
                        int _start = findSmallestVal(map);
                        int _end = findLargestVal(map);
                        int newLength = _end - _start;
                        if(newLength > oldLength) {
                            // revert back
                            map[A[i]] = oldIndex;
                        }
                    }
                }
            }
    
            if(count == m) {
                break;
            }
        }
    
        p.i = findSmallestVal(map);
        p.j = findLargestVal(map);
    
        return p;
    }
    
    0 讨论(0)
  • 2021-02-04 14:52

    Complexity

    Time: O((m+n)log m)

    Space: O(m)

    The following is provably optimal up to a logarithmic factor. (I believe the log factor cannot be got rid of, and so it's optimal.)

    Variant 1 is just a special case of variant 2 with all the multiplicities being 1, after removing duplicates from B. So it's enough to handle the latter variant; if you want variant 1, just remove duplicates in O(m log m) time. In the following, let m denote the number of distinct elements in B. We assume m < n, because otherwise we can just return -1, in constant time.

    For each index i in A, we will find the smallest index s[i] such that A[i..s[i]] contains B[1..m], with the right multiplicities. The crucial observation is that s[i] is non-decreasing, and this is what allows us to do it in amortised linear time.

    Start with i=j=1. We will keep a tuple (c[1], c[2], ... c[m]) of the number of times each element of B occurs, in the current window A[i..j]. We will also keep a set S of indices (a subset of 1..m) for which the count is "right" (i.e., k for which c[k]=1 in variant 1, or c[k] = <the right number> in variant 2).

    So, for i=1, starting with j=1, increment each c[A[j]] (if A[j] was an element of B), check if c[A[j]] is now "right", and add or remove j from S accordingly. Stop when S has size m. You've now found s[1], in at most O(n log m) time. (There are O(n) j's, and each set operation took O(log m) time.)

    Now for computing successive s[i]s, do the following. Increment i, decrement c[A[i]], update S accordingly, and, if necessary, increment j until S has size m again. That gives you s[i] for each i. At the end, report the (i,s[i]) for which s[i]-i was smallest.

    Note that although it seems that you might be performing up to O(n) steps (incrementing j) for each i, the second pointer j only moves to the right: so the total number of times you can increment j is at most n. (This is amortised analysis.) Each time you increment j, you might perform a set operation that takes O(log m) time, so the total time is O(n log m). The space required was for keeping the tuple of counts, the set of elements of B, the set S, and some constant number of other variables, so O(m) in all.

    There is an obvious O(m+n) lower bound, because you need to examine all the elements. So the only question is whether we can prove the log factor is necessary; I believe it is.

    0 讨论(0)
  • 2021-02-04 14:53

    My solution:

    a. Create a hash table with m keys, one for each value in B. Each key in H maps to a dynamic array of sorted indices containing indices in A that are equal to B[i]. This takes O(n) time. We go through each index j in A. If key A[i] exists in H (O(1) time) then add an value containing the index j of A to the list of indices that H[A[i]] maps to.

    At this point we have 'binned' n elements into m bins. However, total storage is just O(n).

    b. The 2nd part of the algorithm involves maintaining a ‘left’ index and a ‘right’ index for each list in H. Lets create two arrays of size m called L and R that contain these values. Initially in our example,

    We also keep track of the “best” minimum window.

    We then iterate over the following actions on L and R which are inherently greedy: i. In each iteration, we compute the minimum and maximum values in L and R. For L, Lmax - Lmin is the window and for R, Rmax - Rmin is the window. We update the best window if one of these windows is better than the current best window. We use a min heap to keep track of the minimum element in L and a max heap to keep track of the largest element in R. These take O(m*log(m)) time to build. ii. From a ‘greedy’ perspective, we want to take the action that will minimize the window size in each L and R. For L it intuitively makes sense to increment the minimum index, and for R, it makes sense to decrement the maximum index.

    We want to increment the array position for the minimum value until it is larger than the 2nd smallest element in L, and similarly, we want to decrement the array position for the largest value in R until it is smaller than the 2nd largest element in R.

    Next, we make a key observation:

    If L[i] is the minimum value in L and R[i] is less than the 2nd smallest element in L, ie, if R[i] were to still be the minimum value in L if L[i] were replaced with R[i], then we are done. We now have the “best” index in list i that can contribute to the minimum window. Also, all the other elements in R cannot contribute to the best window since their L values are all larger than L[i]. Similarly if R[j] is the maximum element in R and L[j] is greater than the 2nd largest value in R, we are also done by setting R[j] = L[j]. Any other index in array i to the left of L[j] has already been accounted for as have all indices to the right of R[j], and all indices between L[j] and R[j] will perform poorer than L[j].

    Otherwise, we simply increment the array position L[i] until it is larger than the 2nd smallest element in L and decrement array position R[j] (where R[j] is the max in R) until it is smaller than the 2nd largest element in R. We compute the windows and update the best window if one of the L or R windows is smaller than the best window. We can do a Fibonacci search to optimally do the increment / decrement. We keep incrementing L[i] using Fibonacci increments until we are larger than the 2nd largest element in L. We can then perform binary search to get the smallest element L[i] that is larger than the 2nd largest element in L, similar for the set R. After the increment / decrement, we pop the largest element from the max heap for R and the minimum element for the min heap for L and insert the new values of L[i] and R[j] into the heaps. This is an O(log(m)) operation.

    Step ii. would terminate when Lmin can’t move any more to the right or Rmax can’t move any more to the left (as the R/L values are the same). Note that we can have scenarios in which L[i] = R[i] but if it is not the minimum element in L or the maximum element in R, the algorithm would still continue.

    Runtime analysis: a. Creation of the hash table takes O(n) time and O(n) space. b. Creation of heaps: O(m*log(m)) time and O(m) space. c. The greedy iterative algorithm is a little harder to analyze. Its runtime is really bounded by the distribution of elements. Worst case, we cover all the elements in each array in the hash table. For each element, we perform an O(log(m)) heap update.

    Worst case runtime is hence O(n*log(m)) for the iterative greedy algorithm. In the best case, we discover very fast that L[i] = R[i] for the minimum element in L or the maximum element in R…run time is O(1)*log(m) for the greedy algorithm!

    Average case seems really hard to analyze. What is the average “convergence” of this algorithm to the minimum window. If we were to assume that the Fibonacci increments / binary search were to help, we could say we only look at m*log(n/m) elements (every list has n/m elements) in the average case. In that case, the running time of the greedy algorithm would be m*log(n/m)*log(m).

    Total running time Best case: O(n + m*log(m) + log(m)) time = O(n) assuming m << n Average case: O(n + m*log(m) + m*log(n/m)*log(m)) time = O(n) assuming m << n. Worst case: O(n + n*log(m) + m*log(m)) = O(n*log(m)) assuming m << n.

    Space: O(n + m) (hashtable and heaps) always.

    Edit: Here is a worked out example:

    A[5, 1, 1, 5, 6, 1, 1, 5] B[5, 6]

    H: { 5 => {1, 4, 8} 6 => {5} }

    Greedy Algorithm:

    L => {1, 1} R => {3, 1}

    Iteration 1: a. Lmin = 1 (since H{5}[1] < H{6}[1]), Lmax = 5. Window: 5 - 1 + 1= 5 Increment Lmin pointer, it now becomes 2.

    L => {2, 1}

    Rmin = H{6}[1] = 5, Rmax = H{5}[3] = 8. Window = 8 - 5 + 1 = 4. Best window so far = 4 (less than 5 computed above). We also note the indices in A (5, 8) for the best window.

    Decrement Rmax, it now becomes 2 and the value is 4.

    R => {2, 1}

    b. Now, Lmin = 4 (H{5}[2]) and the index i in L is 1. Lmax = 5 (H{6}[1]) and the index in L is 2. We can't increment Lmin since L[1] = R[1] = 2. Thus we just compute the window now.

    The window = Lmax - Lmin + 1 = 2 which is the best window so far.

    Thus, the best window in A = (4, 5).

    0 讨论(0)
提交回复
热议问题