Sliding window - how to get window location on image?

前端 未结 2 411
一个人的身影
一个人的身影 2021-01-17 05:22

Referring to this great sliding window implementation in python: https://github.com/keepitsimple/ocrtest/blob/master/sliding_window.py#blob_contributors_box, my question is

相关标签:
2条回答
  • 2021-01-17 05:22

    It might be easier for you to understand what's going on if you try using flatten=False to create a 'grid' of windows onto the image:

    import numpy as np
    from scipy.misc import lena
    from matplotlib import pyplot as plt
    
    img = lena()
    print(img.shape)
    # (512, 512)
    
    # make a 64x64 pixel sliding window on img. 
    win = sliding_window(img, (64, 64), shiftSize=None, flatten=False)
    
    print(win.shape)
    # (8, 8, 64, 64)
    # i.e. (img_height / win_height, img_width / win_width, win_height, win_width)
    
    plt.imshow(win[4, 4, ...])
    plt.draw()
    # grid position [4, 4] contains Lena's eye and nose
    

    To get the corresponding pixel coordinates, you could do something like this:

    def get_win_pixel_coords(grid_pos, win_shape, shift_size=None):
        if shift_size is None:
            shift_size = win_shape
        gr, gc = grid_pos
        sr, sc = shift_size
        wr, wc = win_shape
        top, bottom = gr * sr, (gr * sr) + wr
        left, right = gc * sc, (gc * sc) + wc
    
        return top, bottom, left, right
    
    # check for grid position [3, 4]
    t, b, l, r = get_win_pixel_coords((3, 4), (64, 64))
    
    print(np.all(img[t:b, l:r] == win[3, 4, :, :]))
    # True
    

    With flatten=True, the 8x8 grid of 64x64-pixel windows will just get flattened out into 64-long vector of 64x64-pixel windows. In that case you could use something like np.unravel_index to convert from the 1D vector index into a tuple of grid indices, then use these to get the pixel coordinates as above:

    win = sliding_window(img, (64, 64), flatten=True)
    
    grid_pos = np.unravel_index(12, (8, 8))
    t, b, l, r = get_win_pixel_coords(grid_pos, (64, 64))
    
    print(np.all(img[t:b, l:r] == win[12]))
    # True
    

    OK, I'll try and address some of the questions you raised in the comments.

    I want the pixel location of the window relative to the actual pixel dimensions original image.

    Perhaps I was not clear enough - you can already do this using something like my get_win_pixel_coords() function, which gives you the top, bottom, left and right coordinates of the window relative to the image. For example:

    win = sliding_window(img, (64, 64), shiftSize=None, flatten=False)
    
    fig, (ax1, ax2) = plt.subplots(1, 2)
    ax1.hold(True)
    ax1.imshow(win[4, 4])
    ax1.plot(8, 9, 'oy')         # position of Lena's eye, relative to this window
    
    t, b, l, r = get_win_pixel_coords((4, 4), (64, 64))
    
    ax2.hold(True)
    ax2.imshow(img)
    ax2.plot(t + 8, l + 9, 'oy') # position of Lena's eye, relative to whole image
    
    plt.show()
    

    Also notice that I've updated get_win_pixel_coords() to deal with cases where shiftSize is not None (i.e the windows don't perfectly tile the image with no overlap).

    So I'm guessing that in that case, I should just make the grid be equal to the original image's dimensions, is that right? (instead of using 8x8).

    No, if the windows tile the image without overlap (i.e. shiftSize=None, which I've assumed so far), then if you made the grid dimensions equal to the pixel dimensions of the image, every window would just contain a single pixel!

    So in my case, for an image of width: 360 and height: 240, would that mean I use this line: grid_pos = np.unravel_index(*12*, (240, 360)). Also, what does 12 refer to in this line?

    As I said, making the 'grid size' equal to the image dimensions would be pointless, since every window would contain only a single pixel (at least, assuming that the windows are non-overlapping). The 12 would refer to the index into the flattened grid of windows, e.g.:

    x = np.arange(25).reshape(5, 5)    # 5x5 grid containing numbers from 0 ... 24
    x_flat = x.ravel()                 # flatten it into a 25-long vector
    print(x_flat[12])                  # the 12th element in the flattened vector
    # 12
    row, col = np.unravel_index(12, (5, 5))  # corresponding row/col index in x
    print(x[row, col])
    # 12
    

    I am shifting 10 pixels with each window, and the first sliding window starts from coordinates 0x0 on the image, and the second starts from 10x10, etc, then I want it the program to return not just the window contents but the coordinates corresponding to each window, i.e. 0,0, and then 10,10, etc

    As I said, you can already get the position of the window relative to the image using the top, bottom, left, right coordinates returned by get_win_pixel_coords(). You could wrap this up into a single function if you really wanted:

    def get_pixels_and_coords(win_grid, grid_pos):
        pix = win_grid[grid_pos]
        tblr = get_win_pixel_coords(grid_pos, pix.shape)
        return pix, tblr
    
    # e.g.:
    pix, tblr = get_pixels_and_coords(win, (3, 4))
    

    If you want the coordinates of every pixel in the window, relative to the image, another trick you could use is to construct arrays containing the row and column indices of every pixel in the image, then apply your sliding window to these:

    ridx, cidx = np.indices(img.shape)
    r_win = sliding_window(ridx, (64, 64), shiftSize=None, flatten=False)
    c_win = sliding_window(cidx, (64, 64), shiftSize=None, flatten=False)
    
    pix = win[3, 4]    # pixel values
    r = r_win[3, 4]    # row index of every pixel in the window
    c = c_win[3, 4]    # column index of every pixel in the window
    
    0 讨论(0)
  • 2021-01-17 05:23

    To update @ali_m answer's since scipy.misc.lena() is no longer available in >0.17. here is an example using the RGB image scipy.misc.face() with a slight modification to the sliding window source code provided in the OP.

    import numpy as np
    from scipy.misc import ascent, face
    from matplotlib import pyplot as plt
    from numpy.lib.stride_tricks import as_strided as ast
    
    def get_win_pixel_coords(grid_pos, win_shape, shift_size=None):
        if shift_size is None:
            shift_size = win_shape
        gr, gc = grid_pos
        sr, sc = shift_size
        wr, wc = win_shape
        top, bottom = gr * sr, (gr * sr) + wr
        left, right = gc * sc, (gc * sc) + wc
    
        return top, bottom, left, right
    def norm_shape(shape):
        '''
        Normalize numpy array shapes so they're always expressed as a tuple,
        even for one-dimensional shapes.
        Parameters
            shape - an int, or a tuple of ints
        Returns
            a shape tuple
        '''
        try:
            i = int(shape)
            return (i,)
        except TypeError:
            # shape was not a number
            pass
    
        try:
            t = tuple(shape)
            return t
        except TypeError:
            # shape was not iterable
            pass
    
        raise TypeError('shape must be an int, or a tuple of ints')
    
    
    def sliding_window(a,ws,ss = None,flatten = True):
        '''
        Return a sliding window over a in any number of dimensions
        '''
        if None is ss:
            # ss was not provided. the windows will not overlap in any direction.
            ss = ws
        ws = norm_shape(ws)
        ss = norm_shape(ss)
        # convert ws, ss, and a.shape to numpy arrays
        ws = np.array(ws)
        ss = np.array(ss)
        shap = np.array(a.shape)
        # ensure that ws, ss, and a.shape all have the same number of dimensions
        ls = [len(shap),len(ws),len(ss)]
        if 1 != len(set(ls)):
            raise ValueError(\
            'a.shape, ws and ss must all have the same length. They were %s' % str(ls))
    
        # ensure that ws is smaller than a in every dimension
        if np.any(ws > shap):
            raise ValueError(\
            'ws cannot be larger than a in any dimension.\
     a.shape was %s and ws was %s' % (str(a.shape),str(ws)))
        # how many slices will there be in each dimension?
        newshape = norm_shape(((shap - ws) // ss) + 1)
        # the shape of the strided array will be the number of slices in each dimension
        # plus the shape of the window (tuple addition)
        newshape += norm_shape(ws)
        # the strides tuple will be the array's strides multiplied by step size, plus
        # the array's strides (tuple addition)
        newstrides = norm_shape(np.array(a.strides) * ss) + a.strides
        a = ast(a,shape = newshape,strides = newstrides)
        if not flatten:
            return a
        # Collapse strided so that it has one more dimension than the window.  I.e.,
        # the new array is a flat list of slices.
        meat = len(ws) if ws.shape else 0
        firstdim = (np.product(newshape[:-meat]),) if ws.shape else ()
        dim = firstdim + (newshape[-meat:])
        # remove any dimensions with size 1
        #dim = filter(lambda i : i != 1,dim)
        return a.reshape(dim), newshape
    

    Adding the return variable newshape to sliding_window() results in the ability to pass flatten=True and still know the nature of the grid created by the sliding window function. In my application a flattened vector of computational windows is desirable because it is a good point to scale your computations applied to each computational window.

    If a 96x96 window (i.e. tile x tile) is applied with 50% overlap in both directions to an image with shape (768,1024,3), the input image could be padded to ensure that input image is divisible by N windows with no remainder before the sliding window is created.

    img = face()
    nxo,nyo,nzo = img.shape
    
    tile=96 
    pad_img = np.vstack((np.hstack((img,np.fliplr(img))),np.flipud(np.hstack((img,np.fliplr(img))))))
    
    pad_img = pad_img[:nxo+(nxo % tile),:nyo+(nyo % tile), :]
    
    
    
    win, ind = sliding_window(pad_img, (96, 96,3), (48,48,3))
    print(ind)
    (15, 21, 1, 96, 96, 3)
    print(win.shape)
    (315, 96, 96, 3)
    

    The grid of the computational windows contains 15 rows and 21 columns and 315 computational windows. grid_poscan be determined using the index from the flattened flattened vector of computational windows (i.e. win), ind[0] and ind[1]. If we were interested in the 239th computational window:

    grid_pos = np.unravel_index(239,(ind[0],ind[1]))
    print(grid_pos1)
    #(11, 8)
    

    Then the bounding coordinates for the computational window in the original image can be found using:

    t, b, l, r = get_win_pixel_coords(grid_pos, (96, 96), (48,48))
    print(np.all(pad_img[t:b, l:r] == win[239]))
    #True
    
    0 讨论(0)
提交回复
热议问题