given an array of 0s and 1s, find maximum subarray such that number of zeros and 1s are equal. This needs to be done in O(n) time and O(1) space.
I have an algo whic
linear time, constant space. Let me know if there is any bug I missed.
tested in python3.
def longestBalancedSubarray(A):
lo,hi = 0,len(A)-1
ones = sum(A);zeros = len(A) - ones
while lo < hi:
if ones == zeros: break
else:
if ones > zeros:
if A[lo] == 1: lo+=1; ones-=1
elif A[hi] == 1: hi+=1; ones-=1
else: lo+=1; zeros -=1
else:
if A[lo] == 0: lo+=1; zeros-=1
elif A[hi] == 0: hi+=1; zeros-=1
else: lo+=1; ones -=1
return(A[lo:hi+1])
Here's an actionscript solution that looked like it was scaling O(n). Though it might be more like O(n log n). It definitely uses only O(1) memory.
Warning I haven't checked how complete it is. I could be missing some cases.
protected function findLongest(array:Array, start:int = 0, end:int = -1):int {
if (end < start) {
end = array.length-1;
}
var startDiff:int = 0;
var endDiff:int = 0;
var diff:int = 0;
var length:int = end-start;
for (var i:int = 0; i <= length; i++) {
if (array[i+start] == '1') {
startDiff++;
} else {
startDiff--;
}
if (array[end-i] == '1') {
endDiff++;
} else {
endDiff--;
}
//We can stop when there's no chance of equalizing anymore.
if (Math.abs(startDiff) > length - i) {
diff = endDiff;
start = end - i;
break;
} else if (Math.abs(endDiff) > length - i) {
diff = startDiff;
end = i+start;
break;
}
}
var bit:String = diff > 0 ? '1': '0';
var diffAdjustment:int = diff > 0 ? -1: 1;
//Strip off the bad vars off the ends.
while (diff != 0 && array[start] == bit) {
start++;
diff += diffAdjustment;
}
while(diff != 0 && array[end] == bit) {
end--;
diff += diffAdjustment;
}
//If we have equalized end. Otherwise recurse within the sub-array.
if (diff == 0)
return end-start+1;
else
return findLongest(array, start, end);
}
This algorithm is O(n) time and O(1) space. It may modify the source array, but it restores all the information back. So it is not working with const arrays. If this puzzle has several solutions, this algorithm picks the solution nearest to the array beginning. Or it might be modified to provide all solutions.
Algorithm
Variables:
p1
- subarray startp2
- subarray endd
- difference of 1s and 0s in the subarray
d
, if d==0
, stop. If d<0
, invert the array and after balanced subarray is found invert it back.d > 0
advance p2
: if the array element is 1, just decrement both p2
and d
. Otherwise p2
should pass subarray of the form 11*0
, where *
is some balanced subarray. To make backtracking possible, 11*0?
is changed to 0?*00
(where ?
is the value next to the subarray). Then d
is decremented.p1
and p2
.p2
: if the array element is 1, just increment p2
. Otherwise we found element, changed on step 2. Revert the changes and pass subarray of the form 11*0
.p1
: if the array element is 1, just increment p1
. Otherwise p1
should pass subarray of the form 0*11
.p1
and p2
, if p2 - p1
improved.p2
is at the end of the array, stop. Otherwise continue with step 4.How does it work
Algorithm iterates through all possible positions of the balanced subarray in the input array. For each subarray position p1
and p2
are kept as far from each other as possible, providing locally longest subarray. Subarray with maximum length is chosen between all these subarrays.
To determine the next best position for p1
, it is advanced to the first position where the balance between 1s and 0s is changed by one. (Step 5).
To determine the next best position for p2
, it is advanced to the last position where the balance between 1s and 0s is changed by one. To make it possible, step 2 detects all such positions (starting from the array's end) and modifies the array in such a way, that it is possible to iterate through these positions with linear search. (Step 4).
While performing step 2, two possible conditions may be met. Simple one: when value '1' is found; pointer p2
is just advanced to the next value, no special treatment needed. But when value '0' is found, balance is going in wrong direction, it is necessary to pass through several bits until correct balance is found. All these bits are of no interest to the algorithm, stopping p2
there will give either a balanced subarray, which is too short, or a disbalanced subarray. As a result, p2
should pass subarray of the form 11*0
(from right to left, *
means any balanced subarray). There is no chance to go the same way in other direction. But it is possible to temporary use some bits from the pattern 11*0
to allow backtracking. If we change first '1' to '0', second '1' to the value next to the rightmost '0', and clear the value next to the rightmost '0': 11*0? -> 0?*00
, then we get the possibility to (first) notice the pattern on the way back, since it starts with '0', and (second) find the next good position for p2
.
C++ code:
#include <cstddef>
#include <bitset>
static const size_t N = 270;
void findLargestBalanced(std::bitset<N>& a, size_t& p1s, size_t& p2s)
{
// Step 1
size_t p1 = 0;
size_t p2 = N;
int d = 2 * a.count() - N;
bool flip = false;
if (d == 0) {
p1s = 0;
p2s = N;
return;
}
if (d < 0) {
flip = true;
d = -d;
a.flip();
}
// Step 2
bool next = true;
while (d > 0) {
if (p2 < N) {
next = a[p2];
}
--d;
--p2;
if (a[p2] == false) {
if (p2+1 < N) {
a[p2+1] = false;
}
int dd = 2;
while (dd > 0) {
dd += (a[--p2]? -1: 1);
}
a[p2+1] = next;
a[p2] = false;
}
}
// Step 3
p2s = p2;
p1s = p1;
do {
// Step 4
if (a[p2] == false) {
a[p2++] = true;
bool nextToRestore = a[p2];
a[p2++] = true;
int dd = 2;
while (dd > 0 && p2 < N) {
dd += (a[p2++]? 1: -1);
}
if (dd == 0) {
a[--p2] = nextToRestore;
}
}
else {
++p2;
}
// Step 5
if (a[p1++] == false) {
int dd = 2;
while (dd > 0) {
dd += (a[p1++]? -1: 1);
}
}
// Step 6
if (p2 - p1 > p2s - p1s) {
p2s = p2;
p1s = p1;
}
} while (p2 < N);
if (flip) {
a.flip();
}
}
Sum all elements in the array, then diff = (array.length - sum) will be the difference in number of 0s and 1s.
For cases 2 & 3, initialize two pointers, start & end pointing to beginning and end of array. If we have more 1s, then move the pointers inward (start++ or end--) based on whether array[start] = 1 or array[end] = 1, and update sum accordingly. At each step check if sum = (end - start) / 2. If this condition is true, then start and end represent the bounds of your maximum subarray.
Here we end up doing two passes of the array, once to calculate sum, and once which moving the pointers inward. And we are using constant space as we just need to store sum and two index values.
If anyone wants to knock up some pseudocode, you're more than welcome :)
If indeed your algorithm is valid in all cases (see my comment to your question noting some corrections to it), notice that the prefix array is the only obstruction to your constant memory goal.
Examining the find
function reveals that this array can be replaced with two integers, thereby eliminating the dependence on the length of the input and solving your problem. Consider the following:
find
function. These are a[start - 1]
and a[end]
. Yes, start
and end
change, but does this merit the array?start
is incremented or end
is decremented only by one.a[start - 1]
by an integer, how would you update its value? Put another way, for each transition in the loop that changes the value of start
, what could you do to update the integer accordingly to reflect the new value of a[start - 1]
?a[end]
?a[start - 1]
and a[end]
can be reflected with two integers, doesn't the whole prefix array no longer serve a purpose? Can't it therefore be removed?With no need for the prefix array and all storage dependencies on the length of the input removed, your algorithm will use a constant amount of memory to achieve its goal, thereby making it O(n) time and O(1) space.
I would prefer you solve this yourself based on the insights above, as this is homework. Nevertheless, I have included a solution below for reference:
#include <iostream>
using namespace std;
void find( int *data, int &start, int &end )
{
// reflects the prefix sum until start - 1
int sumStart = 0;
// reflects the prefix sum until end
int sumEnd = 0;
for( int i = start; i <= end; i++ )
sumEnd += data[i];
while( start < end )
{
int length = end - start + 1;
int sum = 2 * ( sumEnd - sumStart );
if( sum == length )
break;
else if( sum < length )
{
// sum needs to increase; get rid of the lower endpoint
if( data[ start ] == 0 && data[ end ] == 1 )
{
// sumStart must be updated to reflect the new prefix sum
sumStart += data[ start ];
start++;
}
else
{
// sumEnd must be updated to reflect the new prefix sum
sumEnd -= data[ end ];
end--;
}
}
else
{
// sum needs to decrease; get rid of the higher endpoint
if( data[ start ] == 1 && data[ end ] == 0 )
{
// sumStart must be updated to reflect the new prefix sum
sumStart += data[ start ];
start++;
}
else
{
// sumEnd must be updated to reflect the new prefix sum
sumEnd -= data[ end ];
end--;
}
}
}
}
int main() {
int length;
cin >> length;
// get the data
int data[length];
for( int i = 0; i < length; i++ )
cin >> data[i];
// solve and print the solution
int start = 0, end = length - 1;
find( data, start, end );
if( start == end )
puts( "No soln" );
else
printf( "%d %d\n", start, end );
return 0;
}
Now my algorithm is O(n) time and O(Dn) space where Dn is the total imblance in the list.
This solution doesn't modify the list.
let D be the difference of 1s and 0s found in the list.
First, let's step linearily through the list and calculate D, just to see how it works:
I'm gonna use this list as an example : l=1100111100001110
Element D
null 0
1 1
1 2 <-
0 1
0 0
1 1
1 2
1 3
1 4
0 3
0 2
0 1
0 0
1 1
1 2
1 3
0 2 <-
Finding the longest balanced subarray is equivalent to finding 2 equal elements in D that are the more far appart. (in this example the 2 2s marked with arrows.)
The longest balanced subarray is between first occurence of element +1 and last occurence of element. (first arrow +1 and last arrow : 00111100001110)
Remark:
The longest subarray will always be between 2 elements of D that are between [0,Dn] where Dn is the last element of D. (Dn = 2 in the previous example) Dn is the total imbalance between 1s and 0s in the list. (or [Dn,0] if Dn is negative)
In this example it means that I don't need to "look" at 3s or 4s
Proof:
Let Dn > 0 .
If there is a subarray delimited by P (P > Dn). Since 0 < Dn < P, before reaching the first element of D which is equal to P we reach one element equal to Dn. Thus, since the last element of the list is equal to Dn, there is a longest subarray delimited by Dns than the one delimited by Ps.And therefore we don't need to look at Ps
P cannot be less than 0 for the same reasons
the proof is the same for Dn <0
Now let's work on D, D isn't random, the difference between 2 consecutive element is always 1 or -1. Ans there is an easy bijection between D and the initial list. Therefore I have 2 solutions for this problem:
For the time being I cannot find a better approach than the first one:
First calculate Dn (in O(n)) . Dn=2
Second instead of creating D, create a dictionnary where the keys are the value of D (between [0 and Dn]) and the value of each keys is a couple (a,b) where a is the first occurence of the key and b the last.
Element D DICTIONNARY
null 0 {0:(0,0)}
1 1 {0:(0,0) 1:(1,1)}
1 2 {0:(0,0) 1:(1,1) 2:(2,2)}
0 1 {0:(0,0) 1:(1,3) 2:(2,2)}
0 0 {0:(0,4) 1:(1,3) 2:(2,2)}
1 1 {0:(0,4) 1:(1,5) 2:(2,2)}
1 2 {0:(0,4) 1:(1,5) 2:(2,6)}
1 3 { 0:(0,4) 1:(1,5) 2:(2,6)}
1 4 {0:(0,4) 1:(1,5) 2:(2,6)}
0 3{0:(0,4) 1:(1,5) 2:(2,6) }
0 2 {0:(0,4) 1:(1,5) 2:(2,9) }
0 1 {0:(0,4) 1:(1,10) 2:(2,9) }
0 0 {0:(0,11) 1:(1,10) 2:(2,9) }
1 1 {0:(0,11) 1:(1,12) 2:(2,9) }
1 2 {0:(0,11) 1:(1,12) 2:(2,13)}
1 3 {0:(0,11) 1:(1,12) 2:(2,13)}
0 2 {0:(0,11) 1:(1,12) 2:(2,15)}
and you chose the element with the largest difference : 2:(2,15) and is l[3:15]=00111100001110 (with l=1100111100001110).
Time complexity :
2 passes, the first one to caclulate Dn, the second one to build the dictionnary. find the max in the dictionnary.
Total is O(n)
Space complexity:
the current element in D : O(1) the dictionnary O(Dn)
I don't take 3 and 4 in the dictionnary because of the remark
The complexity is O(n) time and O(Dn) space (in average case Dn << n).
I guess there is may be a better way than a dictionnary for this approach.
Any suggestion is welcome.
Hope it helps
The second way to proceed would be to transform your list into D. (since it's easy to go back from D to the list it's ok). (O(n) time and O(1) space, since I transform the list in place, even though it might not be a "valid" O(1) )
Then from D you need to find the 2 equal element that are the more far appart.
it looks like finding the longest cycle in a linked list, A modification of Richard Brent algorithm might return the longest cycle but I don't know how to do it, and it would take O(n) time and O(1) space.
Once you find the longest cycle, go back to the first list and print it.
This algorithm would take O(n) time and O(1) space complexity.