Given an unsorted array of positive integers, find the length of the longest subarray whose elements when sorted are continuous. Can you think of an O(n) solution?
Examp
Don't get your hopes up, this is only a partial answer.
I'm quite confident that the problem is not solvable in O(n)
. Unfortunately, I can't prove it.
If there is a way to solve it in less than O(n^2)
, I'd suspect that the solution is based on the following strategy:
O(n)
(or maybe O(n log n)
) whether there exists a continuous subarray as you describe it with at least i
elements. Lets call this predicate E(i)
.i
for which E(i)
holds.The total running time of this algorithm would then be O(n log n)
(or O(n log^2 n)
).
This is the only way I could come up with to reduce the problem to another problem that at least has the potential of being simpler than the original formulation. However, I couldn't find a way to compute E(i)
in less than O(n^2)
, so I may be completely off...
UPD2: The following solution is for a problem when it is not required that subarray is contiguous. I misunderstood the problem statement. Not deleting this, as somebody may have an idea based on mine that will work for the actual problem.
Here's what I've come up with:
Create an instance of a dictionary (which is implemented as hash table, giving O(1) in normal situations). Keys are integers, values are hash sets of integers (also O(1)) – var D = new Dictionary<int, HashSet<int>>
.
Iterate through the array A
and for each integer n
with index i
do:
n-1
and n+1
are contained in D
.
D.Add(n, new HashSet<int>)
n-1
, do D.Add(n, D[n-1])
D[n-1].UnionWith(D[n+1]); D[n+1] = D[n] = D[n-1];
D[n].Add(n)
Now go through each key in D
and find the hash set with the greatest length (finding length is O(1)). The greatest length will be the answer.
To my understanding, the worst case complexity will be O(n*log(n)), only because of the UnionWith
operation. I don't know how to calculate the average complexity, but it should be close to O(n). Please correct me if I am wrong.
UPD: To speak code, here's a test implementation in C# that gives the correct result in both of the OP's examples:
var A = new int[] {4, 5, 1, 5, 7, 6, 8, 4, 1};
var D = new Dictionary<int, HashSet<int>>();
foreach(int n in A)
{
if(D.ContainsKey(n-1) && D.ContainsKey(n+1))
{
D[n-1].UnionWith(D[n+1]);
D[n+1] = D[n] = D[n-1];
}
else if(D.ContainsKey(n-1))
{
D[n] = D[n-1];
}
else if(D.ContainsKey(n+1))
{
D[n] = D[n+1];
}
else if(!D.ContainsKey(n))
{
D.Add(n, new HashSet<int>());
}
D[n].Add(n);
}
int result = int.MinValue;
foreach(HashSet<int> H in D.Values)
{
if(H.Count > result)
{
result = H.Count;
}
}
Console.WriteLine(result);
here's another way to think of your problem: suppose you have an array composed only of 1s and 0s, you want to find the longest consecutive run of 1s. this can be done in linear time by run-length encoding the 1s (ignore the 0's). in order to transform your original problem into this new run length encoding problem, you compute a new array b[i] = (a[i] < a[i+1]). this doesn't have to be done explicitly, you can just do it implicitly to achieve an algorithm with constant memory requirement and linear complexity.
See the array S in it's mathematical set definition :
S = Uj=0k (Ij)
Where the Ij are disjoint integer segments. You can design a specific interval tree (based on a Red-Black tree or a self-balancing tree that you like :) ) to store the array in this mathematical definitions. The node and tree structures should look like these :
struct node {
int d, u;
int count;
struct node *n_left, *n_right;
}
Here, d is the lesser bound of the integer segment and u, the upper bound. count
is added to take care of possible duplicates in the array : when trying to insert an already existing element in the tree, instead of doing nothing, we will increment the count
value of the node in which it is found.
struct root {
struct node *root;
}
The tree will only store disjoint nodes, thus, the insertion is a bit more complex than a classical Red-Black tree insertion. When inserting intervals, you must scans for potential overflows with already existing intervals. In your case, since you will only insert singletons this should not add too much overhead.
Given three nodes P, L and R, L being the left child of P and R the right child of P. Then, you must enforce L.u < P.d and P.u < R.d (and for each node, d <= u, of course).
When inserting an integer segment [x,y], you must find "overlapping" segments, that is to say, intervals [u,d] that satisfies one of the following inequalities :
y >= d - 1
OR
x <= u + 1
If the inserted interval is a singleton x
, then you can only find up to 2 overlapping interval nodes N1 and N2 such that N1.d == x + 1
and N2.u == x - 1
. Then you have to merge the two intervals and update count, which leaves you with N3 such that N3.d = N2.d
, N3.u = N1.u
and N3.count = N1.count + N2.count + 1
. Since the delta between N1.d
and N2.u
is the minimal delta for two segments to be disjoint, then you must have one of the following :
So the insertion will still be in O(log(n))
in the worst case.
From here, I can't figure out how to handle the order in the initial sequence but here is a result that might be interesting : if the input array defines a perfect integer segment, then the tree only has one node.
The first is O(nlog(n))
in time and O(n)
space, the second is O(n)
in time and O(n)
in space, and the third is O(n)
in time and O(1)
in space.
build a binary search tree
then traverse it in order.
keep 2 pointers one for the start of max subset and one for the end.
keep the max_size
value while iterating the tree.
it is a O(n*log(n))
time and space complexity.
you can always sort numbers set using counting sort in a linear time
and run through the array, which means O(n)
time and space
complexity.
Assuming there isn't overflow or a big integer data type. Assuming the array is a mathematical set (no duplicate values). You can do it in O(1)
of memory:
O(n)
time complexity.This will require two passes over the data. First create a hash map, mapping ints to bools. I updated my algorithm to not use map, from the STL, which I'm positive uses sorting internally. This algorithm uses hashing, and can be easily updated for any maximum or minimum combination, even potentially all possible values an integer can obtain.
#include <iostream>
using namespace std;
const int MINIMUM = 0;
const int MAXIMUM = 100;
const unsigned int ARRAY_SIZE = MAXIMUM - MINIMUM;
int main() {
bool* hashOfIntegers = new bool[ARRAY_SIZE];
//const int someArrayOfIntegers[] = {10, 9, 8, 6, 5, 3, 1, 4, 2, 8, 7};
//const int someArrayOfIntegers[] = {10, 6, 5, 3, 1, 4, 2, 8, 7};
const int someArrayOfIntegers[] = {-2, -3, 8, 6, 12, 14, 4, 0, 16, 18, 20};
const int SIZE_OF_ARRAY = 11;
//Initialize hashOfIntegers values to false, probably unnecessary but good practice.
for(unsigned int i = 0; i < ARRAY_SIZE; i++) {
hashOfIntegers[i] = false;
}
//Chage appropriate values to true.
for(int i = 0; i < SIZE_OF_ARRAY; i++) {
//We subtract the MINIMUM value to normalize the MINIMUM value to a zero index for negative numbers.
hashOfIntegers[someArrayOfIntegers[i] - MINIMUM] = true;
}
int sequence = 0;
int maxSequence = 0;
//Find the maximum sequence in the values
for(unsigned int i = 0; i < ARRAY_SIZE; i++) {
if(hashOfIntegers[i]) sequence++;
else sequence = 0;
if(sequence > maxSequence) maxSequence = sequence;
}
cout << "MAX SEQUENCE: " << maxSequence << endl;
return 0;
}
The basic idea is to use the hash map as a bucket sort, so that you only have to do two passes over the data. This algorithm is O(2n), which in turn is O(n)