Sliding window - how to get window location on image?

Referring to this great sliding window implementation in python: https://github.com/keepitsimple/ocrtest/blob/master/sliding_window.py#blob_contributors_box, my question is - where in the code can I actually see the location of the current window on the image? Or how can I grab its location?

On lines 72 and after line 85, I tried printing out shape and newstrides, but I'm clearly not getting anywhere here. In the norm_shape function, I printed out tuple but the output was only the window dimensions (if I understood that right, too).

But I need not just the dimensions, such as width and height, I also need to know where exactly from the image a window is being extracted, in terms of the pixel coordinates, or which rows/columns in the image.

It might be easier for you to understand what's going on if you try using flatten=False to create a 'grid' of windows onto the image:

import numpy as np
from scipy.misc import lena
from matplotlib import pyplot as plt

img = lena()
print(img.shape)
# (512, 512)

# make a 64x64 pixel sliding window on img. 
win = sliding_window(img, (64, 64), shiftSize=None, flatten=False)

print(win.shape)
# (8, 8, 64, 64)
# i.e. (img_height / win_height, img_width / win_width, win_height, win_width)

plt.imshow(win[4, 4, ...])
plt.draw()
# grid position [4, 4] contains Lena's eye and nose

To get the corresponding pixel coordinates, you could do something like this:

def get_win_pixel_coords(grid_pos, win_shape, shift_size=None):
    if shift_size is None:
        shift_size = win_shape
    gr, gc = grid_pos
    sr, sc = shift_size
    wr, wc = win_shape
    top, bottom = gr * sr, (gr * sr) + wr
    left, right = gc * sc, (gc * sc) + wc

    return top, bottom, left, right

# check for grid position [3, 4]
t, b, l, r = get_win_pixel_coords((3, 4), (64, 64))

print(np.all(img[t:b, l:r] == win[3, 4, :, :]))
# True

With flatten=True, the 8x8 grid of 64x64-pixel windows will just get flattened out into 64-long vector of 64x64-pixel windows. In that case you could use something like np.unravel_index to convert from the 1D vector index into a tuple of grid indices, then use these to get the pixel coordinates as above:

win = sliding_window(img, (64, 64), flatten=True)

grid_pos = np.unravel_index(12, (8, 8))
t, b, l, r = get_win_pixel_coords(grid_pos, (64, 64))

print(np.all(img[t:b, l:r] == win[12]))
# True

OK, I'll try and address some of the questions you raised in the comments.

I want the pixel location of the window relative to the actual pixel dimensions original image.

Perhaps I was not clear enough - you can already do this using something like my get_win_pixel_coords() function, which gives you the top, bottom, left and right coordinates of the window relative to the image. For example:

win = sliding_window(img, (64, 64), shiftSize=None, flatten=False)

fig, (ax1, ax2) = plt.subplots(1, 2)
ax1.hold(True)
ax1.imshow(win[4, 4])
ax1.plot(8, 9, 'oy')         # position of Lena's eye, relative to this window

t, b, l, r = get_win_pixel_coords((4, 4), (64, 64))

ax2.hold(True)
ax2.imshow(img)
ax2.plot(t + 8, l + 9, 'oy') # position of Lena's eye, relative to whole image

plt.show()

Also notice that I've updated get_win_pixel_coords() to deal with cases where shiftSize is not None (i.e the windows don't perfectly tile the image with no overlap).

So I'm guessing that in that case, I should just make the grid be equal to the original image's dimensions, is that right? (instead of using 8x8).

No, if the windows tile the image without overlap (i.e. shiftSize=None, which I've assumed so far), then if you made the grid dimensions equal to the pixel dimensions of the image, every window would just contain a single pixel!

So in my case, for an image of width: 360 and height: 240, would that mean I use this line: grid_pos = np.unravel_index(*12*, (240, 360)). Also, what does 12 refer to in this line?

As I said, making the 'grid size' equal to the image dimensions would be pointless, since every window would contain only a single pixel (at least, assuming that the windows are non-overlapping). The 12 would refer to the index into the flattened grid of windows, e.g.:

x = np.arange(25).reshape(5, 5)    # 5x5 grid containing numbers from 0 ... 24
x_flat = x.ravel()                 # flatten it into a 25-long vector
print(x_flat[12])                  # the 12th element in the flattened vector
# 12
row, col = np.unravel_index(12, (5, 5))  # corresponding row/col index in x
print(x[row, col])
# 12

I am shifting 10 pixels with each window, and the first sliding window starts from coordinates 0x0 on the image, and the second starts from 10x10, etc, then I want it the program to return not just the window contents but the coordinates corresponding to each window, i.e. 0,0, and then 10,10, etc

As I said, you can already get the position of the window relative to the image using the top, bottom, left, right coordinates returned by get_win_pixel_coords(). You could wrap this up into a single function if you really wanted:

def get_pixels_and_coords(win_grid, grid_pos):
    pix = win_grid[grid_pos]
    tblr = get_win_pixel_coords(grid_pos, pix.shape)
    return pix, tblr

# e.g.:
pix, tblr = get_pixels_and_coords(win, (3, 4))

If you want the coordinates of every pixel in the window, relative to the image, another trick you could use is to construct arrays containing the row and column indices of every pixel in the image, then apply your sliding window to these:

ridx, cidx = np.indices(img.shape)
r_win = sliding_window(ridx, (64, 64), shiftSize=None, flatten=False)
c_win = sliding_window(cidx, (64, 64), shiftSize=None, flatten=False)

pix = win[3, 4]    # pixel values
r = r_win[3, 4]    # row index of every pixel in the window
c = c_win[3, 4]    # column index of every pixel in the window

To update @ali_m answer's since scipy.misc.lena() is no longer available in >0.17. here is an example using the RGB image scipy.misc.face() with a slight modification to the sliding window source code provided in the OP.

import numpy as np
from scipy.misc import ascent, face
from matplotlib import pyplot as plt
from numpy.lib.stride_tricks import as_strided as ast

def get_win_pixel_coords(grid_pos, win_shape, shift_size=None):
    if shift_size is None:
        shift_size = win_shape
    gr, gc = grid_pos
    sr, sc = shift_size
    wr, wc = win_shape
    top, bottom = gr * sr, (gr * sr) + wr
    left, right = gc * sc, (gc * sc) + wc

    return top, bottom, left, right
def norm_shape(shape):
    '''
    Normalize numpy array shapes so they're always expressed as a tuple,
    even for one-dimensional shapes.
    Parameters
        shape - an int, or a tuple of ints
    Returns
        a shape tuple
    '''
    try:
        i = int(shape)
        return (i,)
    except TypeError:
        # shape was not a number
        pass

    try:
        t = tuple(shape)
        return t
    except TypeError:
        # shape was not iterable
        pass

    raise TypeError('shape must be an int, or a tuple of ints')


def sliding_window(a,ws,ss = None,flatten = True):
    '''
    Return a sliding window over a in any number of dimensions
    '''
    if None is ss:
        # ss was not provided. the windows will not overlap in any direction.
        ss = ws
    ws = norm_shape(ws)
    ss = norm_shape(ss)
    # convert ws, ss, and a.shape to numpy arrays
    ws = np.array(ws)
    ss = np.array(ss)
    shap = np.array(a.shape)
    # ensure that ws, ss, and a.shape all have the same number of dimensions
    ls = [len(shap),len(ws),len(ss)]
    if 1 != len(set(ls)):
        raise ValueError(\
        'a.shape, ws and ss must all have the same length. They were %s' % str(ls))

    # ensure that ws is smaller than a in every dimension
    if np.any(ws > shap):
        raise ValueError(\
        'ws cannot be larger than a in any dimension.\
 a.shape was %s and ws was %s' % (str(a.shape),str(ws)))
    # how many slices will there be in each dimension?
    newshape = norm_shape(((shap - ws) // ss) + 1)
    # the shape of the strided array will be the number of slices in each dimension
    # plus the shape of the window (tuple addition)
    newshape += norm_shape(ws)
    # the strides tuple will be the array's strides multiplied by step size, plus
    # the array's strides (tuple addition)
    newstrides = norm_shape(np.array(a.strides) * ss) + a.strides
    a = ast(a,shape = newshape,strides = newstrides)
    if not flatten:
        return a
    # Collapse strided so that it has one more dimension than the window.  I.e.,
    # the new array is a flat list of slices.
    meat = len(ws) if ws.shape else 0
    firstdim = (np.product(newshape[:-meat]),) if ws.shape else ()
    dim = firstdim + (newshape[-meat:])
    # remove any dimensions with size 1
    #dim = filter(lambda i : i != 1,dim)
    return a.reshape(dim), newshape

Adding the return variable newshape to sliding_window() results in the ability to pass flatten=True and still know the nature of the grid created by the sliding window function. In my application a flattened vector of computational windows is desirable because it is a good point to scale your computations applied to each computational window.

If a 96x96 window (i.e. tile x tile) is applied with 50% overlap in both directions to an image with shape (768,1024,3), the input image could be padded to ensure that input image is divisible by N windows with no remainder before the sliding window is created.

img = face()
nxo,nyo,nzo = img.shape

tile=96 
pad_img = np.vstack((np.hstack((img,np.fliplr(img))),np.flipud(np.hstack((img,np.fliplr(img))))))

pad_img = pad_img[:nxo+(nxo % tile),:nyo+(nyo % tile), :]



win, ind = sliding_window(pad_img, (96, 96,3), (48,48,3))
print(ind)
(15, 21, 1, 96, 96, 3)
print(win.shape)
(315, 96, 96, 3)

The grid of the computational windows contains 15 rows and 21 columns and 315 computational windows. grid_poscan be determined using the index from the flattened flattened vector of computational windows (i.e. win), ind[0] and ind[1]. If we were interested in the 239th computational window:

grid_pos = np.unravel_index(239,(ind[0],ind[1]))
print(grid_pos1)
#(11, 8)

Then the bounding coordinates for the computational window in the original image can be found using:

t, b, l, r = get_win_pixel_coords(grid_pos, (96, 96), (48,48))
print(np.all(pad_img[t:b, l:r] == win[239]))
#True

来源：https://stackoverflow.com/questions/27584233/sliding-window-how-to-get-window-location-on-image

标签

python

numpy

computer-vision

sliding-window