There are two classes, let\'s call them X and O. A number of elements belonging to these classes are spread out in the xy-plane. Here is an example where the two classes are
Computationally the most effective way to decide whether two sets of points are linearly separable is by applying linear programming. GLTK is perfect for that purpose and pretty much every highlevel language offers an interface for it - R, Python, Octave, Julia, etc.
Let's say you have a set of points A and B:
Then you have to minimize the 0 for the following conditions:
(The A below is a matrix, not the set of points from above)
"Minimizing 0" effectively means that you don't need to actually optimize an objective function because this is not necessary to find out if the sets are linearly separable.
In the end () is defining the separating plane.
In case you are interested in a working example in R or the math details, then check this out.
Linear perceptron is guaranteed to find such separation if one exists.
See: http://en.wikipedia.org/wiki/Perceptron .
If you found the convex hull for both the X
points and the O
points separately (i.e. you have two separate convex hulls at this stage) you would then just need to check whether any segments of the hulls intersected or whether either hull was enclosed by the other.
If the two hulls were found to be totally disjoint the two data-sets would be geometrically separable.
Since the hulls are convex by definition, any separator would be a straight line.
There are efficient algorithms that can be used both to find the convex hull (the qhull algorithm is based on an O(nlog(n))
quickhull approach I think), and to perform line-line intersection tests for a set of segments (sweepline at O(nlog(n))
), so overall it seems that an efficient O(nlog(n))
algorithm should be possible.
This type of approach should also generalise to general k-way
separation tests (where you have k
groups of objects) by forming the convex hull and performing the intersection tests for each group.
It should also work in higher dimensions, although the intersection tests would start to become more challenging...
Hope this helps.
In general that problem is NP-hard but there are good approximate solutions like K-means clustering.
You can probably apply linear programming to this problem. I'm not sure of its computational complexity in formal terms, but the technique is successfully applied to very large problems covering a wide range of domains.
Here is a naïve algorithm that I'm quite sure will work (and, if so, shows that the problem is not NP-complete, as another post claims), but I wouldn't be surprised if it can be done more efficiently: If a separating line exists, it will be possible to move and rotate it until it hits two of the X'es or one X and one O. Therefore, we can simply look at all the possible lines that intersect two X'es or one X and one O, and see if any of them are dividing lines. So, for each of the O(n^2) pairs, iterate over all the n-2 other elements to see if all the X'es are on one side and all the O's on the other. Total time complexity: O(n^3).