Referring to this great sliding window implementation in python: https://github.com/keepitsimple/ocrtest/blob/master/sliding_window.py#blob_contributors_box, my question is
It might be easier for you to understand what's going on if you try using
flatten=False
to create a 'grid' of windows onto the image:
import numpy as np
from scipy.misc import lena
from matplotlib import pyplot as plt
img = lena()
print(img.shape)
# (512, 512)
# make a 64x64 pixel sliding window on img.
win = sliding_window(img, (64, 64), shiftSize=None, flatten=False)
print(win.shape)
# (8, 8, 64, 64)
# i.e. (img_height / win_height, img_width / win_width, win_height, win_width)
plt.imshow(win[4, 4, ...])
plt.draw()
# grid position [4, 4] contains Lena's eye and nose
To get the corresponding pixel coordinates, you could do something like this:
def get_win_pixel_coords(grid_pos, win_shape, shift_size=None):
if shift_size is None:
shift_size = win_shape
gr, gc = grid_pos
sr, sc = shift_size
wr, wc = win_shape
top, bottom = gr * sr, (gr * sr) + wr
left, right = gc * sc, (gc * sc) + wc
return top, bottom, left, right
# check for grid position [3, 4]
t, b, l, r = get_win_pixel_coords((3, 4), (64, 64))
print(np.all(img[t:b, l:r] == win[3, 4, :, :]))
# True
With flatten=True
, the 8x8 grid of 64x64-pixel windows will just get flattened out into 64-long vector of 64x64-pixel windows. In that case you
could use something like np.unravel_index
to convert from the 1D vector index
into a tuple of grid indices, then use these to get the pixel coordinates as
above:
win = sliding_window(img, (64, 64), flatten=True)
grid_pos = np.unravel_index(12, (8, 8))
t, b, l, r = get_win_pixel_coords(grid_pos, (64, 64))
print(np.all(img[t:b, l:r] == win[12]))
# True
OK, I'll try and address some of the questions you raised in the comments.
I want the pixel location of the window relative to the actual pixel dimensions original image.
Perhaps I was not clear enough - you can already do this using something like my get_win_pixel_coords()
function, which gives you the top, bottom, left and right coordinates of the window relative to the image. For example:
win = sliding_window(img, (64, 64), shiftSize=None, flatten=False)
fig, (ax1, ax2) = plt.subplots(1, 2)
ax1.hold(True)
ax1.imshow(win[4, 4])
ax1.plot(8, 9, 'oy') # position of Lena's eye, relative to this window
t, b, l, r = get_win_pixel_coords((4, 4), (64, 64))
ax2.hold(True)
ax2.imshow(img)
ax2.plot(t + 8, l + 9, 'oy') # position of Lena's eye, relative to whole image
plt.show()
Also notice that I've updated get_win_pixel_coords()
to deal with cases where shiftSize
is not None
(i.e the windows don't perfectly tile the image with no overlap).
So I'm guessing that in that case, I should just make the grid be equal to the original image's dimensions, is that right? (instead of using 8x8).
No, if the windows tile the image without overlap (i.e. shiftSize=None
, which I've assumed so far), then if you made the grid dimensions equal to the pixel dimensions of the image, every window would just contain a single pixel!
So in my case, for an image of width: 360 and height: 240, would that mean I use this line:
grid_pos = np.unravel_index(*12*, (240, 360))
. Also, what does 12 refer to in this line?
As I said, making the 'grid size' equal to the image dimensions would be pointless, since every window would contain only a single pixel (at least, assuming that the windows are non-overlapping). The 12 would refer to the index into the flattened grid of windows, e.g.:
x = np.arange(25).reshape(5, 5) # 5x5 grid containing numbers from 0 ... 24
x_flat = x.ravel() # flatten it into a 25-long vector
print(x_flat[12]) # the 12th element in the flattened vector
# 12
row, col = np.unravel_index(12, (5, 5)) # corresponding row/col index in x
print(x[row, col])
# 12
I am shifting 10 pixels with each window, and the first sliding window starts from coordinates 0x0 on the image, and the second starts from 10x10, etc, then I want it the program to return not just the window contents but the coordinates corresponding to each window, i.e. 0,0, and then 10,10, etc
As I said, you can already get the position of the window relative to the image using the top, bottom, left, right coordinates returned by get_win_pixel_coords()
. You could wrap this up into a single function if you really wanted:
def get_pixels_and_coords(win_grid, grid_pos):
pix = win_grid[grid_pos]
tblr = get_win_pixel_coords(grid_pos, pix.shape)
return pix, tblr
# e.g.:
pix, tblr = get_pixels_and_coords(win, (3, 4))
If you want the coordinates of every pixel in the window, relative to the image, another trick you could use is to construct arrays containing the row and column indices of every pixel in the image, then apply your sliding window to these:
ridx, cidx = np.indices(img.shape)
r_win = sliding_window(ridx, (64, 64), shiftSize=None, flatten=False)
c_win = sliding_window(cidx, (64, 64), shiftSize=None, flatten=False)
pix = win[3, 4] # pixel values
r = r_win[3, 4] # row index of every pixel in the window
c = c_win[3, 4] # column index of every pixel in the window
To update @ali_m answer's since scipy.misc.lena() is no longer available in >0.17. here is an example using the RGB image scipy.misc.face() with a slight modification to the sliding window source code provided in the OP.
import numpy as np
from scipy.misc import ascent, face
from matplotlib import pyplot as plt
from numpy.lib.stride_tricks import as_strided as ast
def get_win_pixel_coords(grid_pos, win_shape, shift_size=None):
if shift_size is None:
shift_size = win_shape
gr, gc = grid_pos
sr, sc = shift_size
wr, wc = win_shape
top, bottom = gr * sr, (gr * sr) + wr
left, right = gc * sc, (gc * sc) + wc
return top, bottom, left, right
def norm_shape(shape):
'''
Normalize numpy array shapes so they're always expressed as a tuple,
even for one-dimensional shapes.
Parameters
shape - an int, or a tuple of ints
Returns
a shape tuple
'''
try:
i = int(shape)
return (i,)
except TypeError:
# shape was not a number
pass
try:
t = tuple(shape)
return t
except TypeError:
# shape was not iterable
pass
raise TypeError('shape must be an int, or a tuple of ints')
def sliding_window(a,ws,ss = None,flatten = True):
'''
Return a sliding window over a in any number of dimensions
'''
if None is ss:
# ss was not provided. the windows will not overlap in any direction.
ss = ws
ws = norm_shape(ws)
ss = norm_shape(ss)
# convert ws, ss, and a.shape to numpy arrays
ws = np.array(ws)
ss = np.array(ss)
shap = np.array(a.shape)
# ensure that ws, ss, and a.shape all have the same number of dimensions
ls = [len(shap),len(ws),len(ss)]
if 1 != len(set(ls)):
raise ValueError(\
'a.shape, ws and ss must all have the same length. They were %s' % str(ls))
# ensure that ws is smaller than a in every dimension
if np.any(ws > shap):
raise ValueError(\
'ws cannot be larger than a in any dimension.\
a.shape was %s and ws was %s' % (str(a.shape),str(ws)))
# how many slices will there be in each dimension?
newshape = norm_shape(((shap - ws) // ss) + 1)
# the shape of the strided array will be the number of slices in each dimension
# plus the shape of the window (tuple addition)
newshape += norm_shape(ws)
# the strides tuple will be the array's strides multiplied by step size, plus
# the array's strides (tuple addition)
newstrides = norm_shape(np.array(a.strides) * ss) + a.strides
a = ast(a,shape = newshape,strides = newstrides)
if not flatten:
return a
# Collapse strided so that it has one more dimension than the window. I.e.,
# the new array is a flat list of slices.
meat = len(ws) if ws.shape else 0
firstdim = (np.product(newshape[:-meat]),) if ws.shape else ()
dim = firstdim + (newshape[-meat:])
# remove any dimensions with size 1
#dim = filter(lambda i : i != 1,dim)
return a.reshape(dim), newshape
Adding the return variable newshape
to sliding_window()
results in the ability to pass flatten=True
and still know the nature of the grid created by the sliding window function. In my application a flattened vector of computational windows is desirable because it is a good point to scale your computations applied to each computational window.
If a 96x96 window (i.e. tile
x tile
) is applied with 50% overlap in both directions to an image with shape (768,1024,3)
, the input image could be padded to ensure that input image is divisible by N windows with no remainder before the sliding window is created.
img = face()
nxo,nyo,nzo = img.shape
tile=96
pad_img = np.vstack((np.hstack((img,np.fliplr(img))),np.flipud(np.hstack((img,np.fliplr(img))))))
pad_img = pad_img[:nxo+(nxo % tile),:nyo+(nyo % tile), :]
win, ind = sliding_window(pad_img, (96, 96,3), (48,48,3))
print(ind)
(15, 21, 1, 96, 96, 3)
print(win.shape)
(315, 96, 96, 3)
The grid of the computational windows contains 15 rows and 21 columns and 315 computational windows. grid_pos
can be determined using the index from the flattened flattened vector of computational windows (i.e. win
), ind[0]
and ind[1]
. If we were interested in the 239th computational window:
grid_pos = np.unravel_index(239,(ind[0],ind[1]))
print(grid_pos1)
#(11, 8)
Then the bounding coordinates for the computational window in the original image can be found using:
t, b, l, r = get_win_pixel_coords(grid_pos, (96, 96), (48,48))
print(np.all(pad_img[t:b, l:r] == win[239]))
#True