Finding a substring, with some additional conditions

后端未结

关注

 3  1243

I\'m given a string which looks like this:

1011010100

And my task is to find the length of a substring which number of nulls is always <= nu

相关标签:

3条回答

没有蜡笔的小新

2021-02-03 14:30

(Almost correct, i.e. subtly wrong) linear time solution

with two recodings of the problem (one removed later...), and a sliding window.

Encoding A

You can compress the input to yield the number of subsequent zeros or ones:

+1 -1 +2 -1 +1 -1 +1 -2

This yields encoding A and needs O(n) time.

Encoding B

Now, in encoding A, whenever you encounter two consecutive numbers that sum up to > 0, you compress further. In encoding B, the number in parentheses denotes the length of the substring:

+2(4) -1 +1 -1 +1 -2 ==> +2(6) -1 +1 -2 ==> +2(8) -2

This requires O(n), too. Here, we have the solution immediately: A string of length 8 with two more 1's than 0's. Let's try a more complicated instance (given in encoding A):

+5 -8 +4

Here, the transformation to encoding B doesn't help:

+5(5) -8 +4(4)

Consider also the following instance (encoding B):

+5(9) -6 +4(4) -6 +5(7) -6 +4(6) -6 +5(9)

This sequence will be used to demonstrate the...

Sliding window

First, determine the best solution that starts at the left:

+5 -6 +4 -6 +5 > 0 ==> 9+6+4+6+7=32

Now, extend this to find the best solution that starts at the third position (+4(4)):

+4 -6 +5 -6 +4 > 0 ==> 4+6+7+6+6=29

This solution is not better than the first we have found. Move on:

+5 -6 +4 -6 +5 > 0 ==> 7+6+6+6+9=34

This is the best solution. The algorithm can be implemented in O(n), since head and tail move only forward.

The brief description above doesn't cover all subtleties (negative number at the left in encoding B, head and tail meet, ...). Also, perhaps the recodings are unnecessary and the sliding window can be implemented directly on the 0-1 representation. However, I was able to fully understand the problem only after recoding it.

Getting rid of encoding B

Actually as kindly noted by Millie Smith, "encoding B" might be lossy, meaning that it might lead to inferior solutions in certain (yet to be identified) corner cases. But the sliding window algorithm works just as well on encoding A, so it might be even necessary to skip the conversion to encoding B. (Too lazy to rewrite the explanation of the algorithm...)

0 讨论(0)
发布评论:

提交评论
- 加载中...
心在旅途

2021-02-03 14:38
Here goes my algorithm:

Start from right side:
```
 1. if you find 0 increment the value of count
 2. if you find 1 decrement the count
```
Store these values in an array i.e. v[]. e.g.
```
a[] = {1, 0, 0, 1, 1, 0, 1, 1, 0, 1, 0, 0, 0, 1}

v[] = {0, 1, 0,-1, 0, 1, 0, 1, 2, 1, 2, 1, 0, -1}
```
Now the problem reduces to find indexes from V i.e. i, j such that v[i] < v[j] and i<j.

proof:

if you see here i=0 and j=11 is the possible answer and values are v[i]=0, v[j]=1. This means that till j we have one 0 extra in the string and as the v[i]=0 that means from i to j window size the extra 0 is cancelled by putting extra 1. Hence the answer.

Hope it helps. please let me know if you have doubt. Thanks.
0 讨论(0)
发布评论:

提交评论
- 加载中...
旧时难觅i

2021-02-03 14:45
The O(n) solution is quite simple actually, by building the "height array", representing the number of 1's relative to number of 0's. So a height of 2 means there are 2 more 1's than 0's. The we iterate over the height array once performing some maximality checking with some conditions.

Crucial Observation

Note that a subarray fulfilling the conditions must have its height minimum at the beginning, and maximum at the end (as relative to the subarray, not the whole array).

The height array for the sample in the question can be plotted like this, with marked answer:
```
       v
   /\/\/\
/\/      \
^
```
Proof:

Suppose the height is not minimum at the beginning, that means there is a point inside the subarray where the height is lower than the beginning. At this point, the number of 0 should be larger than the number of 1. Contradiction.

Suppose the height is not maximum at the end, that means there is a point in the subarray where the height is larger than the end, say at index j. Then at index j to the end there are more 0 than 1 (since the height decreases), and so when we "scan" the subarray from right to left we will find more 0 than 1 at index j. Contradiction.

Algorithm

Now the problem can be interpreted as finding the longest subarray which ends with the highest height in the subarray, while keeping the minimum to not exceed the height at the beginning. This is very similar to maximum subarray problem like mentioned by klrmlr ("contiguous subsequence of an array" is better said as "subarray"). And the idea is not keeping an O(n) state, but rather keeping the "maximum so far" and "maximum at this point".

Following that algorithm, below is the pseudocode, traversing the array once:

Procedure Balance_Left_Right
1. Record the lowest and highest point so far
2. If the height at this point is lower than the lowest point so far, then change the starting point to the index after this point
3. If the height at this point is higher or equal to the highest point so far, then this is a valid subarray, record the length (and start and end indices, if you like)
However we will soon see a problem (as pointed by Adam Jackson through personal communication) for this test case: 1100101, visualized as follows:
```
 /\
/  \/\/
```
The correct answer is 3 (the last 101), but the above algorithm will get 2 (the first 11). This is because our answer is apparently hidden behind a "high mountain" (i.e., the lowest point in the answer is not lower than the mountain, and the highest point in the answer is not higher than the mountain).

And so we need to ensure that when we run the Procedure Balance_Left_Right (above), there is no "high mountain" hiding the answer. And so the solution is to traverse the array once from the right, try to partition the array into multiple sections where in each section, this property holds: "the number of 1's is always >= the number of 0's, as traversed from the right", and also for each section, it can't be extended to the left anymore.

Then, in each section, when traversed from the left, will have the maximum height at the end of the section, and this is the maximum. And it can be proven that with this property, the method balance_left_right will find the correct answer for this section. So, we just call our balance_left_right method on each section, and then take the maximum answer among those.

Now, you may ask, why it's sufficient to run Balance_Left_Right on each section? This is because the answer requires the property to hold from the left and from the right, and so it must lies inside one of the sections, since each of the section satisfies half of the property.

The algorithm is still O(n) because we only visit each element twice, once from the right, and once from the left.

The last test case will be partitioned as follows:
```
 /|\ |
/ | \|/\/
**    ***
```
where only the sections marked with asterisk (*) are taken.

So the new algorithm is as follows:

Procedure Max_Balance_Left_Right
1. Partition the input where the number of 1 >= number of 0 from the right (Using Balance_Left from the right, or can call it Balance_right)
2. Run Balance_Left_Right on each partition
3. Take the maximum
Here's the code in Python:
```
def balance_left_right(arr):
    lower = 0
    upper = -2**32
    lower_idx = 0 # Optional
    upper_idx = -1 # Optional
    result = (0,0,0)
    height = 0
    length = 0
    for idx, num in enumerate(arr):
        length += 1
        height += 1 if num==1 else -1
        if height<lower:
            lower = height # Reset the lowest
            upper = height # Reset the highest
            lower_idx = idx+1 # Optional, record the starting point
            length = 0 # Reset the answer
        if height>=upper:
            upper = height
            upper_idx = idx # Optional, record the end point
            if length > result[0]: # Take maximum length
                result = (length, lower_idx, upper_idx)
    return result

def max_balance_left_right(arr):
    all_partitions = []
    start = 0
    end = len(arr)
    right_partitions = balance_left(reversed(arr[start:end]))
    for right_start, right_end in right_partitions:
        all_partitions.append((end-right_end, end-right_start))
    result = (0,0,0)
    for start, end in all_partitions:
        candidate = balance_left_right(arr[start:end])
        if result[0] < candidate[0]:
            result = (candidate[0], candidate[1]+start, candidate[2]+start)
    return result

def balance_left(arr):
    lower = 0
    start_idx = 0
    end_idx = -1
    height = 0
    result = []
    for idx, num in enumerate(arr):
        height += 1 if num==1 else -1
        if height < lower:
            if end_idx != -1:
                result.append((start_idx,end_idx))
            lower = height
            start_idx = idx+1
            end_idx = -1
        else:
            end_idx = idx+1
    if end_idx != -1:
        result.append((start_idx, end_idx))
    return result

test_cases = [
        [1,0,1,1,0,1,0,1,0,0],
        [0,0,1,0,1,0,0,1,0,1,0,0,1,1,0,1,0,1,0,0,1],
        [1,1,1,0,0,0,1,0,0,1,1,0,1,1,0,1,1,0],
        [1,1,0,0,1,0,1,0,1,1,0,0,1,0,0],
        [1,1,0,0,1,0,1],
        [1,1,1,1,1,0,0,0,1,0,1,0,1,1,0,0,1,0,0,1,0,1,1]
        ]
for test_case in test_cases:
    print 'Balance left right:'
    print test_case
    print balance_left_right(test_case)
    print 'Max balance left right:'
    print test_case
    print max_balance_left_right(test_case)
    print
```
which will print:
```
Balance left right:
[1, 0, 1, 1, 0, 1, 0, 1, 0, 0]
(8, 0, 7)
Max balance left right:
[1, 0, 1, 1, 0, 1, 0, 1, 0, 0]
(8, 0, 7)

Balance left right:
[0, 0, 1, 0, 1, 0, 0, 1, 0, 1, 0, 0, 1, 1, 0, 1, 0, 1, 0, 0, 1]
(6, 12, 17)
Max balance left right:
[0, 0, 1, 0, 1, 0, 0, 1, 0, 1, 0, 0, 1, 1, 0, 1, 0, 1, 0, 0, 1]
(6, 12, 17)

Balance left right:
[1, 1, 1, 0, 0, 0, 1, 0, 0, 1, 1, 0, 1, 1, 0, 1, 1, 0]
(8, 9, 16)
Max balance left right:
[1, 1, 1, 0, 0, 0, 1, 0, 0, 1, 1, 0, 1, 1, 0, 1, 1, 0]
(8, 9, 16)

Balance left right:
[1, 1, 0, 0, 1, 0, 1, 0, 1, 1, 0, 0, 1, 0, 0]
(10, 0, 9)
Max balance left right:
[1, 1, 0, 0, 1, 0, 1, 0, 1, 1, 0, 0, 1, 0, 0]
(10, 0, 9)

Balance left right:
[1, 1, 0, 0, 1, 0, 1]
(2, 0, 1)
Max balance left right:
[1, 1, 0, 0, 1, 0, 1]
(3, 4, 6)

Balance left right:
[1, 1, 1, 1, 1, 0, 0, 0, 1, 0, 1, 0, 1, 1, 0, 0, 1, 0, 0, 1, 0, 1, 1]
(5, 0, 4)
Max balance left right:
[1, 1, 1, 1, 1, 0, 0, 0, 1, 0, 1, 0, 1, 1, 0, 0, 1, 0, 0, 1, 0, 1, 1]
(6, 8, 13)
```
For your eyes enjoyment, the height array for the test cases:
```
First:
       v
   /\/\/\
/\/      \
^

Second:
\
 \/\/\           v
      \/\/\  /\/\/\
           \/      \/
            ^

Third:
                v
  /\            /\
 /  \        /\/
/    \/\  /\/
        \/
         ^

Fourth:
         v
 /\      /\
/  \/\/\/  \/\
^             \

Fifth:
 /\   v
/  \/\/
    ^

Sixth:
    /\       v
   /  \      /\
  /    \/\/\/  \/\    /
 /      ^         \/\/
/
```
Clarification Regarding the Question

As some of the readers are confused on what exactly OP wants, although it's already stated clearly in the question, let me explain the question by some examples.

First, the task from the question:

And my task is to find the length of a substring which number of nulls is always <= number of ones. And this should always happen while 'scanning' substring from right to left and from left to right

This refers to something like "Catalan Number Ballot Problem" or "Available Change Problem". In the Wiki you can check the "monotonic path" problem, where you can map "move right" as "1" and "move up" as "0".

The problem is to find a subarray of the original array, such that, when the subarray is traversed from left-to-right and right-to-left, this property holds:

The number of 0's seen so far should not exceed the number of 1's seen so far.

For example, the string 1010 holds the property from left-to-right, because if we scan the array from left-to-right, there will always be more 1's than 0's. But the property doesn't hold from right-to-left, since the first character encountered from the right is 0, and so at the beginning we have more 0's (there is one) than 1's (there is none).

For example given by OP, we see that the answer for the string 1011010100 is the first eight characters, namely: 10110101. Why?

Ok, so when we traverse the subarray from left to right, we see that there is always more 1's than 0's. Let's check the number of 1's and 0's as we traverse the array from left-to-right:
```
1: num(0) = 0, num(1) = 1
0: num(0) = 1, num(1) = 1
1: num(0) = 1, num(1) = 2
1: num(0) = 1, num(1) = 3
0: num(0) = 2, num(1) = 3
1: num(0) = 2, num(1) = 4
0: num(0) = 3, num(1) = 4
1: num(0) = 3, num(1) = 5
```
We can see that at any point of time the number of 0's is always less than or equal to the number of 1's. That's why the property holds from left-to-right. And the same check can be done from right-to-left.

So why isn't 1011010100 and answer?

Let's see when we traverse the string right-to-left:
```
0: num(0) = 1, num(1) = 0
0: num(0) = 2, num(1) = 0
1: num(0) = 2, num(1) = 1
...
```
I didn't put the full traversal because the property has already been violated since the first step, since we have num(0) > num(1). That's why the string 1011010100 doesn't satisfy the constraints of the problem.

You can see also that my "height array" is actually the difference between the number of 1's and the number of 0's, namely: num(1) - num(0). So in order to have the property, we must have the [relative] height positive. That, can be visualized by having the height not less than the initial height.
0 讨论(0)
发布评论:

提交评论
- 加载中...

Finding a substring, with some additional conditions

(Almost correct, i.e. subtly wrong) linear time solution

Encoding A

Encoding B

Sliding window

Getting rid of encoding B