This question is somewhat language-agnostic, but my tool of choice happens to be a numpy array.
What I am doing is taking the difference of two images via PIL:
A clustering package (ie this) should be able to most of the work ( find connected pixels ). Finding the bounding box for a cluster is trivial then.
You could look for connected components in the image and then determine the bounding boxes of these components.
I believe scipy's ndimage module has everything you need...
Here's a quick example
import numpy as np
import scipy as sp
import scipy.ndimage.morphology
# The array you gave above
data = np.array(
[
[1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 1, 1, 0, 0, 0, 0],
[0, 0, 0, 0, 0, 0, 1, 1, 1, 1, 0, 0, 0, 1, 1, 1, 0, 0, 0, 0],
[0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0],
[0, 0, 0, 0, 0, 0, 1, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0],
[0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0],
[0, 0, 0, 0, 0, 1, 1, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0],
[0, 0, 0, 0, 1, 0, 0, 1, 0, 0, 0, 0, 0, 1, 1, 1, 0, 0, 0, 0],
[0, 0, 0, 0, 0, 1, 1, 0, 0, 0, 0, 0, 1, 1, 1, 1, 1, 0, 0, 0],
[0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 1, 1, 0, 0, 0, 0],
[0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0],
])
# Fill holes to make sure we get nice clusters
filled = sp.ndimage.morphology.binary_fill_holes(data)
# Now seperate each group of contigous ones into a distinct value
# This will be an array of values from 1 - num_objects, with zeros
# outside of any contigous object
objects, num_objects = sp.ndimage.label(filled)
# Now return a list of slices around each object
# (This is effectively the tuple that you wanted)
object_slices = sp.ndimage.find_objects(objects)
# Just to illustrate using the object_slices
for obj_slice in object_slices:
print data[obj_slice]
This outputs:
[[1]]
[[1 1 1]
[1 1 1]]
[[1 1 1 1]
[1 0 0 0]
[1 0 0 1]]
[[1]]
[[0 1 1 0]
[1 0 0 1]
[0 1 1 0]]
[[0 0 1 0 0]
[0 1 1 1 0]
[1 1 1 1 1]
[0 1 1 1 0]
[0 0 1 0 0]]
Note that the "object_slices" are basically what you originally asked for, if you need the actual indicies.
Edit: Just wanted to point out that despite it appearing to properly handle the edge case of
[[1 1 1 1]
[1 0 0 0]
[1 0 0 1]]
it actually doesn't (Thus the extra lone [[1]]). You can see this if you print out the "objects" array and take a look at objects 3 & 4.
[[1 0 0 0 0 0 0 0 0 0 0 0 0 2 2 2 0 0 0 0]
[0 0 0 0 0 0 3 3 3 3 0 0 0 2 2 2 0 0 0 0]
[0 0 0 0 0 0 3 0 0 0 0 0 0 0 0 0 0 0 0 0]
[0 0 0 0 0 0 3 0 0 4 0 0 0 0 0 0 0 0 0 0]
[0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0]
[0 0 0 0 0 5 5 0 0 0 0 0 0 0 6 0 0 0 0 0]
[0 0 0 0 5 5 5 5 0 0 0 0 0 6 6 6 0 0 0 0]
[0 0 0 0 0 5 5 0 0 0 0 0 6 6 6 6 6 0 0 0]
[0 0 0 0 0 0 0 0 0 0 0 0 0 6 6 6 0 0 0 0]
[0 0 0 0 0 0 0 0 0 0 0 0 0 0 6 0 0 0 0 0]]
Hope that helps!
[1]